Skip to content
DebuggerCafe

Machine Learning and Deep Learning

  • ABOUT
  • CONTACT
  • DCHUB
  • DebuggerCafe
  • Privacy Policy
  • Projects
  • Topics
Close Menu

Integrating SAM2, Molmo, and Whisper for Object Segmentation

Sovit Ranjan RathSovit Ranjan Rath December 2, 2024December 2, 2024 0 Comments
Integrating SAM2 Molmo and Whisper for Object Segmentation

In this article, we integrate SAM2, Molmo, and Whisper for creating a text-based as well as speech-to-text pipeline for automated object segmentation in images. ...

Read MoreRead More

SAM2 and Molmo: Image Segmentation using Natural Language

Sovit Ranjan RathSovit Ranjan Rath November 25, 2024November 25, 2024 2 Comments
SAM2 and Molmo: Image Segmentation using Natural Language

In this article, we use SAM2 and Molmo for carrying out image segmentation using natural language. We provide a prompt to Molmo, get the coordinates, and pass these to SAM2.1 to segment the objects ...

Read MoreRead More

Introduction to Molmo – Overview and Inference

Sovit Ranjan RathSovit Ranjan Rath November 18, 2024November 18, 2024 6 Comments
Introduction to Molmo – Overview and Inference

In this article, we walk through the Molmo and PixMo technical reports and carry out Molmo image description and pointing demos using the Hugging Face checkpoints. ...

Read MoreRead More

Meta Llama 3 – An Overview

Sovit Ranjan RathSovit Ranjan Rath November 11, 2024November 11, 2024 0 Comments
Meta Llama 3 – An Overview

In this article, we summarize the Metal Llama 3 technical report with the most crucial aspects such as the architecture, the pretraining data, the compute infrastructure, the post-training strategy, and multimodal capabilities of Llama 3. ...

Read MoreRead More

Multimodal RAG with Phi 3.5

Sovit Ranjan RathSovit Ranjan Rath November 4, 2024November 4, 2024 0 Comment
Multimodal RAG with Phi 3.5

In this article, we create a multimodal RAG application from scratch to chat with PDFs, text files, images, and videos using the Phi-3.5 family of language models. ...

Read MoreRead More

Posts pagination

Previous page Page 1 … Page 13 Page 14 Page 15 … Page 77 Next page

Subscribe

* indicates required

Categories

Recent Posts

  • gpt-oss-chat Local RAG and Web Search
  • SAM 3 UI – Image, Video, and Multi-Object Inference
  • gpt-oss Inference with llama.cpp
  • SAM 3 Inference and Paper Explanation
  • Hunyuan3D 2.0 – Explanation and Runpod Docker Image

Pages

  • ABOUT
  • CONTACT
  • DCHUB
  • DebuggerCafe
  • Privacy Policy
  • Projects
  • Topics

Reach out

  • Facebook
  • LinkedIn
  • Twitter

Business WordPress Theme copyright 2025

We are using cookies to give you the best experience on our website.

You can find out more about which cookies we are using or switch them off in .

DebuggerCafe
Powered by  GDPR Cookie Compliance
Privacy Overview

This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.

Strictly Necessary Cookies

Strictly Necessary Cookie should be enabled at all times so that we can save your preferences for cookie settings.

If you disable this cookie, we will not be able to save your preferences. This means that every time you visit this website you will need to enable or disable cookies again.