DebuggerCafe - Deep Learning, Machine Learning, Artificial Intelligence

FasterViT Detection

In this article, we create a custom Vision Transformer based object detection model using NVIDIA's FasterViT backbone and the Single Shot Detection head. ...

Integrating SAM2, Molmo, and Whisper for Object Segmentation

Sovit Ranjan Rath December 2, 2024 0 Comments

In this article, we integrate SAM2, Molmo, and Whisper for creating a text-based as well as speech-to-text pipeline for automated object segmentation in images. ...

SAM2 and Molmo: Image Segmentation using Natural Language

Sovit Ranjan Rath November 25, 2024 2 Comments

In this article, we use SAM2 and Molmo for carrying out image segmentation using natural language. We provide a prompt to Molmo, get the coordinates, and pass these to SAM2.1 to segment the objects ...

Introduction to Molmo – Overview and Inference

Sovit Ranjan Rath November 18, 2024 6 Comments

In this article, we walk through the Molmo and PixMo technical reports and carry out Molmo image description and pointing demos using the Hugging Face checkpoints. ...

Meta Llama 3 – An Overview

Sovit Ranjan Rath November 11, 2024 0 Comments

In this article, we summarize the Metal Llama 3 technical report with the most crucial aspects such as the architecture, the pretraining data, the compute infrastructure, the post-training strategy, and multimodal capabilities of Llama 3. ...