Vision Transformer Archives

ViTPose – Human Pose Estimation with Vision Transformer

In this article, we cover the architecture of ViTPose and ViTPose++ and run inference on images & videos using ViTPose. ...

DINOv2 Segmentation – Fine-Tuning and Transfer Learning Experiments

Sovit Ranjan Rath February 3, 2025 3 Comments

In this article, we simply the semantic segmentation (pixel classification) head of the DINOv2 model and carry out experiments comparing fine-tuning and transfer learning. ...

DINOv2 for Semantic Segmentation

Sovit Ranjan Rath January 27, 2025 2 Comments

In this article, we modify the DINOv2 model for semantic segmentation, freeze the backbone, and train the model on the Penn-Fudan Pedestrian segmentation dataset. ...

FasterViT Detection

Sovit Ranjan Rath December 9, 2024 0 Comment

In this article, we create a custom Vision Transformer based object detection model using NVIDIA's FasterViT backbone and the Single Shot Detection head. ...

Training FasterViT on VOC Segmentation Dataset

Sovit Ranjan Rath June 3, 2024 0 Comments

In this article, we train the FasterViT on the Pascal VOC semantic segmentation dataset using the PyTorch Deep Learning framework. ...

Category: Vision Transformer

ViTPose – Human Pose Estimation with Vision Transformer

DINOv2 Segmentation – Fine-Tuning and Transfer Learning Experiments

DINOv2 for Semantic Segmentation

FasterViT Detection

Training FasterViT on VOC Segmentation Dataset