PyTorch Archives - DebuggerCafe

Fine-Tuning SmolVLM for Receipt OCR

In this article, we are fine-tuning the SmolVLM-256M model for receipt OCR on the SROIE v2 dataset after generating the ground truth data using QwenVL-2B model. ...

Gemma 3 – Advancing Open, Lightweight, Multimodal AI

Sovit Ranjan Rath May 19, 2025 0 Comment

In this article, we explore Gemma 3. We start with the need for Gemma 3, its architecture and multimodal capabilities, and carry out inference using Hugging Face. ...

SmolVLM: Accessible Image Captioning with Small Vision Language Model

Sovit Ranjan Rath May 12, 2025 0 Comments

In this article, we cover the SmolVLM model by Hugging Face. It is a compact 2.2B parameter model for vision understanding. ...

Phi-4 Mini and Phi-4 Multimodal

Sovit Ranjan Rath April 21, 2025 0 Comment

In this article, we cover the Phi-4 Mini model. We start with the discussion of the architecture and create simple Gradio application for Phi-4 Mini Instruct and Phi-4 Multimodal models. ...

ViTPose – Human Pose Estimation with Vision Transformer

Sovit Ranjan Rath April 14, 2025 0 Comment

In this article, we cover the architecture of ViTPose and ViTPose++ and run inference on images & videos using ViTPose. ...

Category: PyTorch

Fine-Tuning SmolVLM for Receipt OCR

Gemma 3 – Advancing Open, Lightweight, Multimodal AI

SmolVLM: Accessible Image Captioning with Small Vision Language Model

Phi-4 Mini and Phi-4 Multimodal

ViTPose – Human Pose Estimation with Vision Transformer