In this article, we cover the SmolVLM model by Hugging Face. It is a compact 2.2B parameter model for vision understanding. ...
SmolVLM: Accessible Image Captioning with Small Vision Language Model
In this article, we cover the SmolVLM model by Hugging Face. It is a compact 2.2B parameter model for vision understanding. ...
In this article, we cover the Phi-4 Mini model. We start with the discussion of the architecture and create simple Gradio application for Phi-4 Mini Instruct and Phi-4 Multimodal models. ...
In this article, we cover the architecture of ViTPose and ViTPose++ and run inference on images & videos using ViTPose. ...
In this article, we are pretraining the DINOv2 model for semantic segmentation on the COCO 2017 dataset and running inference on images and videos. ...
In this article, we conduct multi-class semantic segmentation results by training the DINOv2 model. ...
Business WordPress Theme copyright 2025