DebuggerCafe - Deep Learning, Machine Learning, Artificial Intelligence

Multi-Class Semantic Segmentation using DINOv2

In this article, we conduct multi-class semantic segmentation results by training the DINOv2 model. ...

Moondream – One Model for Captioning, Pointing, and Detection

Sovit Ranjan Rath March 17, 2025 0 Comment

In this article, we cover the Moondream model which is a VLM (Vision Language Model) that can be used for image captioning, visual querying, object pointing, and object detection. ...

Getting Started with Smolagents

Sovit Ranjan Rath March 10, 2025 0 Comments

This article is an introduction to the Smolagents library by Hugging Face. We cover the need for the Smolagents library and using various tools such as image generation tool, Python Interpreter Tool, Web Search Tool. ...

Qwen2 VL – Inference and Fine-Tuning for Understanding Charts

Sovit Ranjan Rath March 3, 2025 4 Comments

In this article, we explore the Qwen2 VL model. We start with the architecture, move on to the inference using pretrained mode, and fine-tune the Qwen2 VL model for chart understanding. ...

Fine-Tuning Llama 3.2 Vision

Sovit Ranjan Rath February 24, 2025 3 Comments

In this article, we are fine-tuning the Llama 3.2 Vision model using Unsloth on a LaTeX2OCR dataset. After fine-tuning, we create a Gradio application where can upload a LaTeX equation image to convert them to raw LaTeX equations. ...