Skip to content
DebuggerCafe

Machine Learning and Deep Learning

  • ABOUT
  • CONTACT
  • DCHUB
  • DebuggerCafe
  • Privacy Policy
  • Projects
  • Topics
Close Menu

Qwen2.5-Omni: An Introduction

Sovit Ranjan RathSovit Ranjan Rath June 2, 2025June 2, 2025 0 Comment
Qwen2.5-Omni: An Introduction

In this article, we explore Qwen2.5-Omni, a multimodal generative AI model that can accept text, image, video, and audio as inputs while outputting both text and audio. ...

Read MoreRead More

Fine-Tuning SmolVLM for Receipt OCR

Sovit Ranjan RathSovit Ranjan Rath May 26, 2025May 26, 2025 0 Comment
Fine-Tuning SmolVLM for Receipt OCR

In this article, we are fine-tuning the SmolVLM-256M model for receipt OCR on the SROIE v2 dataset after generating the ground truth data using QwenVL-2B model. ...

Read MoreRead More

Gemma 3 – Advancing Open, Lightweight, Multimodal AI

Sovit Ranjan RathSovit Ranjan Rath May 19, 2025May 19, 2025 0 Comment
Gemma 3 – Advancing Open, Lightweight, Multimodal AI

In this article, we explore Gemma 3. We start with the need for Gemma 3, its architecture and multimodal capabilities, and carry out inference using Hugging Face. ...

Read MoreRead More

SmolVLM: Accessible Image Captioning with Small Vision Language Model

Sovit Ranjan RathSovit Ranjan Rath May 12, 2025May 12, 2025 0 Comments
SmolVLM: Accessible Image Captioning with Small Vision Language Model

In this article, we cover the SmolVLM model by Hugging Face. It is a compact 2.2B parameter model for vision understanding. ...

Read MoreRead More

Gradio Application using Qwen2.5-VL

Sovit Ranjan RathSovit Ranjan Rath May 5, 2025May 5, 2025 0 Comment
Gradio Application using Qwen2.5-VL

In this article, we build a simple Gradio application with Qwen2.5-VL for image captioning, video captioning, and object detection. ...

Read MoreRead More

Posts pagination

Page 1 Page 2 … Page 69 Next page

Subscribe

* indicates required

Categories

Recent Posts

  • Qwen2.5-Omni: An Introduction
  • Fine-Tuning SmolVLM for Receipt OCR
  • Gemma 3 – Advancing Open, Lightweight, Multimodal AI
  • SmolVLM: Accessible Image Captioning with Small Vision Language Model
  • Gradio Application using Qwen2.5-VL

Pages

  • ABOUT
  • CONTACT
  • DCHUB
  • DebuggerCafe
  • Privacy Policy
  • Projects
  • Topics

Reach out

  • Facebook
  • LinkedIn
  • Twitter

Business WordPress Theme copyright 2025

We are using cookies to give you the best experience on our website.

You can find out more about which cookies we are using or switch them off in .

DebuggerCafe
Powered by  GDPR Cookie Compliance
Privacy Overview

This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.

Strictly Necessary Cookies

Strictly Necessary Cookie should be enabled at all times so that we can save your preferences for cookie settings.

If you disable this cookie, we will not be able to save your preferences. This means that every time you visit this website you will need to enable or disable cookies again.