PyTorch Archives - Page 4 of 43

SmolVLM: Accessible Image Captioning with Small Vision Language Model

In this article, we cover the SmolVLM model by Hugging Face. It is a compact 2.2B parameter model for vision understanding. ...

Phi-4 Mini and Phi-4 Multimodal

Sovit Ranjan Rath April 21, 2025 0 Comments

In this article, we cover the Phi-4 Mini model. We start with the discussion of the architecture and create simple Gradio application for Phi-4 Mini Instruct and Phi-4 Multimodal models. ...

ViTPose – Human Pose Estimation with Vision Transformer

Sovit Ranjan Rath April 14, 2025 0 Comments

In this article, we cover the architecture of ViTPose and ViTPose++ and run inference on images & videos using ViTPose. ...

Pretraining DINOv2 for Semantic Segmentation

Sovit Ranjan Rath March 31, 2025 0 Comment

In this article, we are pretraining the DINOv2 model for semantic segmentation on the COCO 2017 dataset and running inference on images and videos. ...

Multi-Class Semantic Segmentation using DINOv2

Sovit Ranjan Rath March 24, 2025 5 Comments

In this article, we conduct multi-class semantic segmentation results by training the DINOv2 model. ...

Category: PyTorch

SmolVLM: Accessible Image Captioning with Small Vision Language Model

Phi-4 Mini and Phi-4 Multimodal

ViTPose – Human Pose Estimation with Vision Transformer

Pretraining DINOv2 for Semantic Segmentation

Multi-Class Semantic Segmentation using DINOv2