Skip to content
DebuggerCafe

Machine Learning and Deep Learning

  • ABOUT
  • CONTACT
  • DCHUB
  • DebuggerCafe
  • Privacy Policy
  • Projects
  • Topics
Close Menu

Qwen2.5-VL: Architecture, Benchmarks and Inference

Sovit Ranjan RathSovit Ranjan Rath April 28, 2025April 28, 2025 0 Comments
Qwen2.5-VL-Architecture-Benchmarks-and-Inference

In this article, we explore Qwen2.5-VL using Hugging Face Transformers. We cover the Qwen2.5-VL architecture, data preparation, benchmark, and inference. ...

Read MoreRead More

Phi-4 Mini and Phi-4 Multimodal

Sovit Ranjan RathSovit Ranjan Rath April 21, 2025April 21, 2025 0 Comment
Phi-4 Mini and Phi-4 Multimodal

In this article, we cover the Phi-4 Mini model. We start with the discussion of the architecture and create simple Gradio application for Phi-4 Mini Instruct and Phi-4 Multimodal models. ...

Read MoreRead More

ViTPose – Human Pose Estimation with Vision Transformer

Sovit Ranjan RathSovit Ranjan Rath April 14, 2025April 14, 2025 0 Comment
ViTPose – Human Pose Estimation with Vision Transformer

In this article, we cover the architecture of ViTPose and ViTPose++ and run inference on images & videos using ViTPose. ...

Read MoreRead More

Microsoft Autogen – An Introduction

Sovit Ranjan RathSovit Ranjan Rath April 7, 2025April 7, 2025 0 Comments
Microsoft Autogen – An Introduction

This article lays out the introduction to Microsoft Autogen, a framework for building multi-agent systems that can act autonomously alongside humans. ...

Read MoreRead More

Pretraining DINOv2 for Semantic Segmentation

Sovit Ranjan RathSovit Ranjan Rath March 31, 2025March 31, 2025 0 Comment
Pretraining DINOv2 for Semantic Segmentation

In this article, we are pretraining the DINOv2 model for semantic segmentation on the COCO 2017 dataset and running inference on images and videos. ...

Read MoreRead More

Posts pagination

Previous page Page 1 Page 2 Page 3 … Page 69 Next page

Subscribe

* indicates required

Categories

Recent Posts

  • Qwen2.5-Omni: An Introduction
  • Fine-Tuning SmolVLM for Receipt OCR
  • Gemma 3 – Advancing Open, Lightweight, Multimodal AI
  • SmolVLM: Accessible Image Captioning with Small Vision Language Model
  • Gradio Application using Qwen2.5-VL

Pages

  • ABOUT
  • CONTACT
  • DCHUB
  • DebuggerCafe
  • Privacy Policy
  • Projects
  • Topics

Reach out

  • Facebook
  • LinkedIn
  • Twitter

Business WordPress Theme copyright 2025

We are using cookies to give you the best experience on our website.

You can find out more about which cookies we are using or switch them off in .

DebuggerCafe
Powered by  GDPR Cookie Compliance
Privacy Overview

This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.

Strictly Necessary Cookies

Strictly Necessary Cookie should be enabled at all times so that we can save your preferences for cookie settings.

If you disable this cookie, we will not be able to save your preferences. This means that every time you visit this website you will need to enable or disable cookies again.