LLMs Archives - Page 4 of 12

Qwen3 – Unified Models for Thinking and Non-Thinking

In this article, we discuss the latest iteration in the Qwen family of models, Qwen3. We discuss the need for Qwen3, the architecture, and the training strategy. ...

Getting Started with SmolVLM2 – Code Inference

Sovit Ranjan Rath June 9, 2025 0 Comment

In this article, we cover inference code for SmolVLM2. We carry out image and video inference experiments using SmolVLM2-2.2B-Instruct and SmolVLM2-256M-Instruct. ...

Qwen2.5-Omni: An Introduction

Sovit Ranjan Rath June 2, 2025 0 Comment

In this article, we explore Qwen2.5-Omni, a multimodal generative AI model that can accept text, image, video, and audio as inputs while outputting both text and audio. ...

Fine-Tuning SmolVLM for Receipt OCR

Sovit Ranjan Rath May 26, 2025 2 Comments

In this article, we are fine-tuning the SmolVLM-256M model for receipt OCR on the SROIE v2 dataset after generating the ground truth data using QwenVL-2B model. ...

Gemma 3 – Advancing Open, Lightweight, Multimodal AI

Sovit Ranjan Rath May 19, 2025 0 Comment

In this article, we explore Gemma 3. We start with the need for Gemma 3, its architecture and multimodal capabilities, and carry out inference using Hugging Face. ...

Category: LLMs

Qwen3 – Unified Models for Thinking and Non-Thinking

Getting Started with SmolVLM2 – Code Inference

Qwen2.5-Omni: An Introduction

Fine-Tuning SmolVLM for Receipt OCR

Gemma 3 – Advancing Open, Lightweight, Multimodal AI