In this article, we are fine-tuning the SmolVLM-256M model for receipt OCR on the SROIE v2 dataset after generating the ground truth data using QwenVL-2B model. ...
Fine-Tuning SmolVLM for Receipt OCR

In this article, we are fine-tuning the SmolVLM-256M model for receipt OCR on the SROIE v2 dataset after generating the ground truth data using QwenVL-2B model. ...
In this article, we explore Gemma 3. We start with the need for Gemma 3, its architecture and multimodal capabilities, and carry out inference using Hugging Face. ...
In this article, we cover the SmolVLM model by Hugging Face. It is a compact 2.2B parameter model for vision understanding. ...
In this article, we cover the Phi-4 Mini model. We start with the discussion of the architecture and create simple Gradio application for Phi-4 Mini Instruct and Phi-4 Multimodal models. ...
In this article, we cover the architecture of ViTPose and ViTPose++ and run inference on images & videos using ViTPose. ...
Business WordPress Theme copyright 2025