LitGPT – Getting Started


LitGPT – Getting Started

We have seen a flood of LLMs for the past 3 years. With this shift, organizations are also releasing new libraries to use these LLMs. Among these, LitGPT is one of the more prominent and user-friendly ones. With close to 40 LLMs (at the time of writing this), it has something for every use case. From mobile-friendly to cloud-based LLMs. In this article, we are going to cover all the features of LitGPT along with examples.

Tasks supported by LitGPT - Chat, fine-tuning, pretraining, and evaluation of LLMs.
Figure 1. Tasks supported by LitGPT – Chat, fine-tuning, pretraining, and evaluation of LLMs.

With LitGPT, we get access to high-performance LLMs. The ease of pretraining, finetuning, evaluating, and deploying these LLMs at scale is what makes LitGPT stand out.

What will we cover in this article?

  • What are the features provided by LitGPT?
  • How to use a pretrained LLM with LiTGPT?
  • How do we fine-tune an LLM with a supported dataset?
  • And how do we fine-tune a LitGPT model using a custom dataset?

Why LitGPT?

Although there are several options for running LLMs, LitGPT makes the end-to-end workflow extremely easy. It supports:

  • Easy loading of pretrained LLMs for inference.
  • Optimized fine-tuning of pre-defined and custom datasets.
  • Simple evaluation workflows on several benchmark datasets.
  • And serving LLMs using LitAPI.

With its host of models available, we can choose from several of the latest LLM families such as Qwen, Llama3.1, or even Phi4.

In this article, after experimenting with pretrained models, we will fine-tune a small language model for German-to-English translation. This will give us a better idea of how LitGPT works on all fronts.

Installing LitGPT

Installing LitGPT is quite straightforward:

pip install 'litgpt[extra]'

The above command installs all the necessary libraries as well, such as those required from Hugging Face.

Directory Structure

Let’s take a look at the entire directory structure and all the notebooks that we will be dealing with:

├── checkpoints
│   ├── HuggingFaceTB
│   └── meta-llama
├── data
│   └── alpacagpt4
├── finetuning_data
│   ├── train.json
│   └── val.json
├── smollm2_custom_finetune
│   └── logs
├── smollm2_finetune
│   ├── logs
│   ├── step-001000
│   ├── step-002000
│   ├── step-003000
│   ├── step-004000
│   └── step-005000
├── smollm2_wmt_eval
│   ├── config.json
│   ├── generation_config.json
│   ├── model_config.yaml
│   ├── pytorch_model.bin
│   ├── results.json
│   ├── tokenizer_config.json
│   └── tokenizer.json
├── evaluate.ipynb
├── finetuning_custom_data.ipynb
├── finetuning.ipynb
├── inference_pretrained.ipynb
└── prepare_custom_dataset.ipynb
  • The checkpoints directory contains the pretrained models that get downloaded from LitGPT.
  • The data and finetuning_data contain the predefined LitGPT dataset and the custom dataset, respectively.
  • smollm2_custom_finetune contains the model fine-tuned on the custom dataset, and smollm2_finetune contains the model fine-tuned on one of the predefined LitGPT datasets.
  • There are five Jupyter Notebooks directly inside the project directory. We will cover the necessary ones individually.

All the Jupyter Notebooks, custom dataset, and custom fine-tuned models are available via the download section.

Download Code

Inference Using Pretrained Model with LitGPT

We will start with a simple inference experiment using one of the pretrained models.

The code for this is present in the inference_pretrained.ipynb notebook.

Before running inference, let’s check all the models that are available for downloading.

# List all models available to download.
!litgpt download list

This lists all the pretrained models available in the library. Here is the truncated output.

Please specify --repo_id <repo_id>. Available values:
allenai/OLMo-1B-hf
allenai/OLMo-7B-hf
allenai/OLMo-7B-Instruct-hf
BSC-LT/salamandra-2b
BSC-LT/salamandra-2b-instruct
BSC-LT/salamandra-7b
BSC-LT/salamandra-7b-instruct
codellama/CodeLlama-13b-hf
codellama/CodeLlama-13b-Instruct-hf
codellama/CodeLlama-13b-Python-hf
codellama/CodeLlama-34b-hf
.
.
.
togethercomputer/LLaMA-2-7B-32K
Trelis/Llama-2-7b-chat-hf-function-calling-v2
unsloth/Mistral-7B-v0.2

To run inference, we just need one import, that is the LLM class.

from litgpt import LLM

model = LLM.load('meta-llama/Llama-3.2-1B-Instruct')

text = model.generate(
    'Who are you and what can you do?', 
    max_new_tokens=1024
)

print(text)

Here, we load the LLama-3.2 1B instruct model and call the model’s generate method for inference. We provide the prompt and the number of tokens to generate.

The following is a sample output.

Nice to meet you! I'm a conversational AI, which means I'm a computer program designed to simulate conversations and answer questions to the best of my ability. My primary function is to assist and communicate effectively with users like you, providing helpful and relevant information, answering questions, and engaging in discussions.

Here are some things I can do:

1. **Answer questions**: I can process natural language queries and provide accurate and informative responses...

We can also run the generation in a streaming manner and output the text as they are generated.

text = model.generate(
    'Can we talk about animated videos?', 
    stream=True, 
    max_new_tokens=1024
)
for resulting_text in text:
    print(resulting_text, end='', flush=True)

Here, we provide an additional stream=True argument and keep printing the text in a streaming manner. Following is a small example of what this looks like.

Text streaming inference with LitGPT using a pretrained model.
Figure 2. Text streaming inference with LitGPT using a pretrained model.

You can choose any of the models from the list and start experimenting.

Fine-Tuning using LitGPT Predefined Dataset

Now, we will move to fine-tuning a small language model on one of the datasets that comes packaged with the LitGPT library. We will fine-tune the SmolLM2-135M Instruct model.

The code for this resides in the finetuning.ipynb Jupyter Notebook.

The notebook covers inference on a simple question before we start the fine-tuning process. This will help us understand whether the model improved after fine-tuning.

from litgpt import LLM

model = LLM.load('HuggingFaceTB/SmolLM2-135M-Instruct')

text = model.generate(
    'Can we talk about animated videos?', 
    stream=True, 
    max_new_tokens=1024
)
for resulting_text in text:
    print(resulting_text, end='', flush=True)

We are asking the model a simple question about animated videos here. The model gives the following response.

Absolutely! I'd be happy to tailor my answer for you. Let's talk about animated 
videos.

Animated videos often involve animations and animations, which are can be 
created using different techniques and art styles while adhering to existing 
templates and algorithms.

Scalable animated videos, also known as screen-shot videos or video one thousand 
hours (VOH), are those created from AI tools. They are created using a variety 
of techniques believed to mimic the natural motion of an element created during 
filming. These are known as "manipulation time-lapses."

For a scalable animation tool to produce an animated video, these tools are 
typically used after all compositing is done. Animations are generated from the 
necessary physics and other physics equations during that time. Then, these are 
fed into a machine learning algorithm that creates the static animations we 
often see on screen.

Summary: While it's true that animated videos can be created using different 
techniques and algorithms, the dispute is over how they are created. With the 
purpose of creating a scalable animatronic, it is usually generated from a 
scripted AI tool. Dave's Cloud supplies, on behalf of Hugging Face, gets all this.

Because we are using a small language model, although the answer seems good, we have an unnecessary summary at the end. Let’s try to improve that by training it on GPT4-style prompts.

We will use the Alpaca-GPT4 dataset that contains instruction samples generated by GPT4. This can be a good starting point to align our model more towards better responses.

Fine-Tuning SmolLM2-135 Instruct on Alpaca-GPT4 using LitGPT

Fine-tuning using LitGPT is just a single command that can also be run via the terminal. Here, we are executing it in the Jupyter Notebook.

# Fine-tune SmolLM2 on Alpaca-GPT4.
!litgpt finetune_full HuggingFaceTB/SmolLM2-135M-Instruct \
    --data AlpacaGPT4 \
    --out_dir smollm2_finetune \
    --precision "bf16-true" \
    --train.save_interval 1000 \
    --train.log_interval 500 \
    --train.micro_batch_size 4 \
    --train.epochs 1 \
    --train.max_seq_length 1024 \

Here we are using the finetune_full script that fine-tunes the entire model. LitGPT also supports LoRA and adapter training, which you can find here.

Arguments used:

  • The very first argument is the model. We use one of the models that we listed earlier via litgpt download list command.
  • Next comes the dataset. As we are using a predefined dataset from the library, we just pass the model name to the --data argument.
  • The --out_dir argument defines the directory where the resulting model will be saved.
  • As the training was done on an RTX GPU, we are providing the --precision as "bf16-true". You can omit this argument if you are not sure whether your GPU supports BF16 or not.
  • --train.save_interval defines after how many backpropagation steps the model will be saved after. For us, it is 1000.
  • We are logging the train and validation loss after every 500 steps using --train.log_interval.
  • --train.micro_batch_size defines the batch size. For us, that is 4. By default, the global batch size is 16. So, there are a total of 4 gradient accumulation steps. Backpropagation will help after 4 such steps.
  • We are training for 1 epoch and have set the maximum sequence length to 1024.

Let’s take a look at the outputs.

Seed set to 1337
Number of trainable parameters: 162,826,560
The longest sequence length in the train data is 769, the model's maximum sequence length is 769 and context length is 8192
Verifying settings ...
Epoch 1 | iter 500 step 125 | loss train: 1.455, val: n/a | iter time: 123.73 ms (step)
Epoch 1 | iter 1000 step 250 | loss train: 1.523, val: n/a | iter time: 85.16 ms (step)
Epoch 1 | iter 1500 step 375 | loss train: 1.657, val: n/a | iter time: 95.80 ms (step)
Epoch 1 | iter 2000 step 500 | loss train: 1.728, val: n/a | iter time: 90.64 ms (step)
Validating ...
Come up with 3 interesting facts about honeybees.
Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
Come up with 3 interesting facts about honeybees.

### Response:
1. Honey bees are known for their ability to learn the behavior of various food sources and their ability to recognize and distinguish between different varieties.

2. Honey bees can fly up to 12 kilometers (7 miles) in a matter of minutes even while traveling from one flower or plant to another.

3. Honey bees are able to obtain up to 90% of their energy from nectar, which they use to build and forage for themselves. They are also known for their ability to

iter 2400: val loss 1.5341, val time: 5309.79 ms
Epoch 1 | iter 2500 step 625 | loss train: 1.427, val: 1.534 | iter time: 102.72 ms (step)
Epoch 1 | iter 3000 step 750 | loss train: 1.414, val: 1.534 | iter time: 103.74 ms (step)
Epoch 1 | iter 3500 step 875 | loss train: 1.442, val: 1.534 | iter time: 94.22 ms (step)
Epoch 1 | iter 4000 step 1000 | loss train: 1.511, val: 1.534 | iter time: 83.28 ms (step)
Saving checkpoint to 'smollm2_finetune/step-001000'
Epoch 1 | iter 4500 step 1125 | loss train: 1.516, val: 1.534 | iter time: 96.33 ms (step)
.
.
.
iter 9600: val loss 1.3445, val time: 5518.19 ms
Epoch 1 | iter 10000 step 2500 | loss train: 1.251, val: 1.344 | iter time: 93.27 ms (step)
Epoch 1 | iter 10500 step 2625 | loss train: 1.460, val: 1.344 | iter time: 87.83 ms (step)
Epoch 1 | iter 11000 step 2750 | loss train: 1.318, val: 1.344 | iter time: 98.34 ms (step)
Epoch 1 | iter 11500 step 2875 | loss train: 1.525, val: 1.344 | iter time: 105.72 ms (step)
Epoch 1 | iter 12000 step 3000 | loss train: 1.192, val: 1.344 | iter time: 90.05 ms (step)
Validating ...
Come up with 3 interesting facts about honeybees.
Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
Come up with 3 interesting facts about honeybees.

### Response:
1. Honeybees are born as tiny cone-shaped larvae and grow up, to become swarms of maids or drones. While most honeybufs die shortly after laying eggs, their young also survive, growing into bees who can often serve as caretakers, pollinators, and pollinators, and actually making honey honey.

2. Honeybees are one of the most intelligent organisms in the animal kingdom, exhibiting behaviors such as playing, foraging, and foraging for nectar. Honeybee colonies – colonies

iter 12000: val loss 1.3416, val time: 5511.72 ms
Saving checkpoint to 'smollm2_finetune/step-003000'
Epoch 1 | iter 12500 step 3125 | loss train: 1.430, val: 1.342 | iter time: 127.17 ms (step)

| ------------------------------------------------------
| Token Counts
| - Input Tokens              :  7840613
| - Tokens w/ Prompt          :  9921786
| - Total Tokens (w/ Padding) :  17134448
| -----------------------------------------------------
| Performance
| - Training Time             :  1156.97 s
| - Tok/sec                   :  14809.72 tok/s
| -----------------------------------------------------
| Memory Usage                                                                 
| - Memory Used               :  5.97 GB                                        
-------------------------------------------------------

Validating ...
Final evaluation | val loss: 1.322 | val ppl: 3.749

At the end, we have a validation loss of 1.322.

Inference After Fine-Tuning

Let’s run inference using the final saved model. For this, we will use the litgpt chat command and execute it in the terminal. This is necessary because the fine-tuned model adheres to a certain prompt format (Alpaca style) that gets correctly loaded via this command. Directly inferencing using model.generate causes the model to give the wrong output.

litgpt chat smollm2_finetune/final/ --max_new_tokens 1024

We tell the script to generate 1024 maximum tokens.

We give exactly the same prompt as before fine-tuning. Here is a small chat session

Chat inference using the model fine-tuned on AlpacaGPT4 dataset.
Figure 3. Chat inference using the model fine-tuned on AlpacaGPT4 dataset.

This time, the answer seems much better.

Fine-Tuning a LitGPT Model on a Custom Dataset

Now, we will move on to fine-tuning a model on a custom dataset. All fine-tuning using LitGPT happens using the Alpaca dataset format as shown below.

[
    {
        "instruction": "Write a limerick about a
                        pelican.”,
        "input": "",
        "output": "There once was a pelican so fine,
                   \nHis beak was as colorful as
                   sunshine,\nHe would fish all day,\nIn
                   a very unique way,\nThis pelican was
                   truly divine!\n\n\n"
    },
    {
        "instruction": "Identify the odd one out from
                        the group.",
        "input": "Carrot, Apple, Banana, Grape",
        "output": "Carrot\n\n"
    },
]

Now, we will be fine-tuning the SmolLM2-135M Instruct model for German-to-English translation.

Preparing Custom Dataset For LitGPT Fine-Tuning

The first step for us is to prepare the custom dataset in the Alpaca instruction format.

We will use the German to English translation subset of the WMT16 dataset from Hugging Face. It contains 4.55 million training, 2170 validation, and around 3000 validation samples. However, we will only use 50000 samples for training.

The dataset preparation code is in the prepare_custom_dataset.ipynb Jupyter Notebook. Let’s go through that.

from datasets import load_dataset
from tqdm.auto import tqdm

import json
import os

We load the dataset from the Hugging Face datasets library.

raw_dataset = load_dataset(
    'wmt/wmt16',
    'de-en'
)

Next, isolate the training and validation samples.

train_dataset = raw_dataset['train']
valid_dataset = raw_dataset['validation']

Create a helper function to generate the custom dataset format.

def convert_data(orig_data, num_samples=None):
    json_list = []
    
    for i, data in tqdm(enumerate(orig_data), total=len(orig_data)):
        if num_samples and i == num_samples:
            break
        de = data['translation']['de']
        en = data['translation']['en']
    
        sample = {
            'instruction': f"Translate from German to English: {de}",
            'input': '',
            'output': en
        }
    
        json_list.append(sample)
    return json_list

Finally, create the JSON data and save to the finetuning_data directory.

train_json_data = convert_data(train_dataset, num_samples=50000)
valid_json_data = convert_data(valid_dataset)

os.makedirs('finetuning_data', exist_ok=True)

with open('finetuning_data/train.json', 'w') as f:
    json.dump(train_json_data, f)

with open('finetuning_data/val.json', 'w') as f:
    json.dump(valid_json_data, f)

This completes the dataset preparation.

Fine-Tuning SmolLM2 on Custom Data

The code for fine-tuning the SmolLM2-135M Instruct model is present in the finetuning_custom_data.ipynb Jupyter Notebook. Let’s go through the code.

Before fine-tuning, let’s check what kind of translation the pretrained model can carry out.

# Check tranlation quality before fine-tuning.
# From German to English.
from litgpt import LLM

model = LLM.load('HuggingFaceTB/SmolLM2-135M-Instruct')

# The English translation is:
# What are animated videos? Let's talk about them.

text = model.generate(
    'Translate from German to English: Was sind animierte Videos? Lassen Sie uns darüber sprechen.', 
    stream=True, 
    max_new_tokens=1024
)
for resulting_text in text:
    print(resulting_text, end='', flush=True)

The following block shows the result.

German: Were sich animierter Videos? Unterlagen wohin Sie darauf sprechen können.

It clearly is not capable of translating the text at the moment.

We will now fine-tune the model.

# Fine-tune SmolLM2 on Alpaca-GPT4.
!litgpt finetune_full HuggingFaceTB/SmolLM2-135M-Instruct \
    --data JSON \
    --data.json_path finetuning_data \
    --out_dir smollm2_custom_finetune \
    --precision "bf16-true" \
    --train.save_interval 1000 \
    --train.log_interval 500 \
    --train.global_batch_size 16 \
    --train.micro_batch_size 4 \
    --train.epochs 3 \
    --train.max_seq_length 1024 \
    --eval.interval 500 \
    --eval.evaluate_example "first"

We use almost similar arguments as our previous training experiments, with a few changes.

  • --data JSON tells the training script that we are using a JSON format dataset.
  • --data.json_path argument either accepts a single JSON file or a directory containing train.json and val.json. For us, it is the latter. If we provide a path to a single JSON file, then we have to provide a validation split, or a default split ratio will be used. However, we already have a validation set.
  • --eval.evaluate_example "first" tells the training script to use the first sample from the validation set to evaluate the model in certain intervals.

Following is the truncated output from the training.

Seed set to 1337
Number of trainable parameters: 162,826,560
The longest sequence length in the train data is 1024, the model's maximum sequence length is 1024 and context length is 8192
Verifying settings ...
Epoch 1 | iter 500 step 125 | loss train: 1.908, val: n/a | iter time: 82.11 ms (step)
Epoch 1 | iter 1000 step 250 | loss train: 1.887, val: n/a | iter time: 83.52 ms (step)
Epoch 1 | iter 1500 step 375 | loss train: 1.786, val: n/a | iter time: 83.71 ms (step)
Epoch 1 | iter 2000 step 500 | loss train: 1.773, val: n/a | iter time: 81.81 ms (step)
Validating ...
Translate from German to English: Die Premierminister Indiens und Japans trafen sich in Tokio.
Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
Translate from German to English: Die Premierminister Indiens und Japans trafen sich in Tokio.

### Response:
Ladies and gentlemen, the Prime Minister, Mr Indiens-Vertredem, and Mr Japens, the Prime Minister, Mr Vertredem, were in Tao at the moment.

iter 2000: val loss 2.3086, val time: 3744.57 ms
Epoch 1 | iter 2500 step 625 | loss train: 1.512, val: 2.309 | iter time: 81.61 ms (step)
Epoch 1 | iter 3000 step 750 | loss train: 1.562, val: 2.309 | iter time: 83.08 ms (step)
Epoch 1 | iter 3500 step 875 | loss train: 1.488, val: 2.309 | iter time: 84.55 ms (step)
Epoch 1 | iter 4000 step 1000 | loss train: 1.780, val: 2.309 | iter time: 84.65 ms (step)
Validating ...
Translate from German to English: Die Premierminister Indiens und Japans trafen sich in Tokio.
Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
Translate from German to English: Die Premierminister Indiens und Japans trafen sich in Tokio.

### Response:
Young Indieners and Japan are in Tokio.

iter 4000: val loss 2.2462, val time: 2996.63 ms
Saving checkpoint to 'smollm2_custom_finetune/step-001000'
Epoch 1 | iter 4500 step 1125 | loss train: 1.434, val: 2.246 | iter time: 82.46 ms (step)
Epoch 1 | iter 5000 step 1250 | loss train: 1.533, val: 2.246 | iter time: 83.38 ms (step)
Epoch 1 | iter 5500 step 1375 | loss train: 1.478, val: 2.246 | iter time: 84.04 ms (step)
Epoch 1 | iter 6000 step 1500 | loss train: 1.545, val: 2.246 | iter time: 84.01 ms (step)
.
.
.
iter 34000: val loss 2.2546, val time: 3041.75 ms
Epoch 3 | iter 34500 step 8625 | loss train: 0.908, val: 2.255 | iter time: 88.69 ms (step)
Epoch 3 | iter 35000 step 8750 | loss train: 0.932, val: 2.255 | iter time: 86.94 ms (step)
Epoch 3 | iter 35500 step 8875 | loss train: 0.893, val: 2.255 | iter time: 83.31 ms (step)
Epoch 3 | iter 36000 step 9000 | loss train: 0.962, val: 2.255 | iter time: 85.10 ms (step)
Validating ...
Translate from German to English: Die Premierminister Indiens und Japans trafen sich in Tokio.
Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
Translate from German to English: Die Premierminister Indiens und Japans trafen sich in Tokio.

### Response:
The Prime Ministers of India and Japan are presiding over the meeting in Tokyo.

iter 36000: val loss 2.2539, val time: 3058.45 ms
Saving checkpoint to 'smollm2_custom_finetune/step-009000'
Epoch 3 | iter 36500 step 9125 | loss train: 0.906, val: 2.254 | iter time: 83.75 ms (step)
Epoch 3 | iter 37000 step 9250 | loss train: 0.921, val: 2.254 | iter time: 82.49 ms (step)
Epoch 3 | iter 37500 step 9375 | loss train: 0.934, val: 2.254 | iter time: 83.13 ms (step)

| ------------------------------------------------------
| Token Counts
| - Input Tokens              :  15158385
| - Tokens w/ Prompt          :  19657911
| - Total Tokens (w/ Padding) :  28767596
| -----------------------------------------------------
| Performance
| - Training Time             :  2914.75 s
| - Tok/sec                   :  9869.65 tok/s
| -----------------------------------------------------
| Memory Usage                                                                 
| - Memory Used               :  7.44 GB                                        
-------------------------------------------------------

Validating ...
Final evaluation | val loss: 2.230 | val ppl: 9.296

Running Inference using the Custom Dataset Fine-Tuned Model

We will use the final saved model for inference using the terminal chat command.

litgpt chat smollm2_custom_finetune/final/ --max_new_tokens 1024

Here is that chat session.

Chat inference for machine translation using the fine-tuned SmolLM2 model.
Figure 4. Chat inference using LitGPT for machine translation using the fine-tuned SmolLM2 model.

It seems our model needs much more training before it can correctly translate from German to English. The translation is only partially correct now. We will explore more advanced applications for training and inference in future posts.

Summary and Conclusion

We covered the basics of LitGPT in this article. Starting from inference using pretrained model, fine-tuning on predefined dataset, to fine-tuning with a custom dataset, we covered a lot of concepts. In future articles, we will cover better fine-tuning strategies.

If you have any questions, thoughts, or suggestions, please leave them in the comment section. I will surely address them.

You can contact me using the Contact section. You can also find me on LinkedIn, and X.

Liked it? Take a second to support Sovit Ranjan Rath on Patreon!
Become a patron at Patreon!

Leave a Reply

Your email address will not be published. Required fields are marked *