In this article, we integrate SAM2, Molmo, and Whisper for creating a text-based as well as speech-to-text pipeline for automated object segmentation in images. ...
Integrating SAM2, Molmo, and Whisper for Object Segmentation

In this article, we integrate SAM2, Molmo, and Whisper for creating a text-based as well as speech-to-text pipeline for automated object segmentation in images. ...
In this article, we use SAM2 and Molmo for carrying out image segmentation using natural language. We provide a prompt to Molmo, get the coordinates, and pass these to SAM2.1 to segment the objects ...
In this article, we create a multimodal RAG application from scratch to chat with PDFs, text files, images, and videos using the Phi-3.5 family of language models. ...
In this article, we use different Torchvision backbones for creating DeepLab segmentation models and train it on the Pascal VOC semantic segmentation dataset. ...
In this article, we create a custom Phi-3 Gradio chat interface with the ability to upload and query files.s ...
Business WordPress Theme copyright 2025