Vision Transformer Archives - Page 3 of 7

Web-SSL: Scaling Language Free Visual Representation

In this article, we explore the Web-DINO models trained via Web-SSL 2.0 methodology on the MC-2B (MetaCLIP-2B) dataset. ...

ViTPose – Human Pose Estimation with Vision Transformer

Sovit Ranjan Rath April 14, 2025 0 Comments

In this article, we cover the architecture of ViTPose and ViTPose++ and run inference on images & videos using ViTPose. ...

DINOv2 Segmentation – Fine-Tuning and Transfer Learning Experiments

Sovit Ranjan Rath February 3, 2025 3 Comments

In this article, we simply the semantic segmentation (pixel classification) head of the DINOv2 model and carry out experiments comparing fine-tuning and transfer learning. ...

DINOv2 for Semantic Segmentation

Sovit Ranjan Rath January 27, 2025 5 Comments

In this article, we modify the DINOv2 model for semantic segmentation, freeze the backbone, and train the model on the Penn-Fudan Pedestrian segmentation dataset. ...

FasterViT Detection

Sovit Ranjan Rath December 9, 2024 0 Comment

In this article, we create a custom Vision Transformer based object detection model using NVIDIA's FasterViT backbone and the Single Shot Detection head. ...

Category: Vision Transformer

Web-SSL: Scaling Language Free Visual Representation

ViTPose – Human Pose Estimation with Vision Transformer

DINOv2 Segmentation – Fine-Tuning and Transfer Learning Experiments

DINOv2 for Semantic Segmentation

FasterViT Detection