Vision Transformer Archives

Fine-Tuning Qwen3.5

In this article, we fine-tune the Qwen3.5-0.8B model on the VQA-RAD dataset, which is a question-answering dataset based on radiology images. After training, we carry out inference using the fine-tuned model. ...

Creating a Sketch to HTML Application with Qwen3-VL

Sovit Ranjan Rath December 22, 2025 0 Comment

Creating a Sketch to Image Application with Qwen3-VL

In this article, we explore creating a simple sketch to HTML application using Qwen3-VL where users can upload an image or screenshot for a potential website and the Qwen3-VL model will give back the HTML. ...

Introduction to Qwen3-VL

Sovit Ranjan Rath December 15, 2025 2 Comments

In this article, we explore the Qwen3-VL model, the latest iteration of the Qwen-VL series. We start with model architecture and benchmarks, and then move to hands-on inference for object detection, OCR, video understanding, and sketch-to-HTML using Qwen3-VL. ...

Fine-Tuning Phi-3.5 Vision Instruct

Sovit Ranjan Rath December 8, 2025 0 Comment

In this article we are fine-tuning the Phi-3.5 Vision Instruct model on a receipt OCR dataset. We are using Hugging Face libraries and training a LoRA. ...

Object Detection with DEIMv2

Sovit Ranjan Rath December 1, 2025 0 Comment

In this article, we explore the DEIMv2 object detection model based on the DINOv3 and HGNetv2 backbones, along with carrying inference on images and videos. ...

Category: Vision Transformer

Fine-Tuning Qwen3.5

Creating a Sketch to HTML Application with Qwen3-VL

Introduction to Qwen3-VL

Fine-Tuning Phi-3.5 Vision Instruct

Object Detection with DEIMv2