NLP Archives - DebuggerCafe

gpt-oss Inference with llama.cpp

In this article, we explore the gpt-oss model card and run inference with gpt-oss-20b using llama.cpp locally. ...

Qwen3 – Unified Models for Thinking and Non-Thinking

Sovit Ranjan Rath July 7, 2025 0 Comment

In this article, we discuss the latest iteration in the Qwen family of models, Qwen3. We discuss the need for Qwen3, the architecture, and the training strategy. ...

Llama 3.2 Vision – With Unsloth and Gradio

Sovit Ranjan Rath February 17, 2025 3 Comments

In this article, we explore the Llama 3.2 Vision model. We start with the architecture, and eventually build a Gradio application for chatting with images while loading the model from Unsloth. ...

Unsloth – Getting Started

Sovit Ranjan Rath February 10, 2025 3 Comments

This article covers an introduction to the Unsloth LLM library. It covers the need for Unsloth, the steps to install it, running inference using various language models like Llama 3.1, Gemma2, and Mistral v-0.3, and also understanding the chat templates. ...

Introduction to Molmo – Overview and Inference

Sovit Ranjan Rath November 18, 2024 6 Comments

In this article, we walk through the Molmo and PixMo technical reports and carry out Molmo image description and pointing demos using the Hugging Face checkpoints. ...

Category: NLP

gpt-oss Inference with llama.cpp

Qwen3 – Unified Models for Thinking and Non-Thinking

Llama 3.2 Vision – With Unsloth and Gradio

Unsloth – Getting Started

Introduction to Molmo – Overview and Inference