We have seen a flood of LLMs for the past 3 years. With this shift, organizations are also releasing new libraries to use these LLMs. Among these, LitGPT is one of the more prominent and user-friendly ones. With close to 40 LLMs (at the time of writing this), it has something for every use case. From mobile-friendly to cloud-based LLMs. In this article, we are going to cover all the features of LitGPT along with examples.
With LitGPT, we get access to high-performance LLMs. The ease of pretraining, finetuning, evaluating, and deploying these LLMs at scale is what makes LitGPT stand out.
What will we cover in this article?
- What are the features provided by LitGPT?
- How to use a pretrained LLM with LiTGPT?
- How do we fine-tune an LLM with a supported dataset?
- And how do we fine-tune a LitGPT model using a custom dataset?
Why LitGPT?
Although there are several options for running LLMs, LitGPT makes the end-to-end workflow extremely easy. It supports:
- Easy loading of pretrained LLMs for inference.
- Optimized fine-tuning of pre-defined and custom datasets.
- Simple evaluation workflows on several benchmark datasets.
- And serving LLMs using LitAPI.
With its host of models available, we can choose from several of the latest LLM families such as Qwen, Llama3.1, or even Phi4.
In this article, after experimenting with pretrained models, we will fine-tune a small language model for German-to-English translation. This will give us a better idea of how LitGPT works on all fronts.
Installing LitGPT
Installing LitGPT is quite straightforward:
pip install 'litgpt[extra]'
The above command installs all the necessary libraries as well, such as those required from Hugging Face.
Directory Structure
Let’s take a look at the entire directory structure and all the notebooks that we will be dealing with:
├── checkpoints │ ├── HuggingFaceTB │ └── meta-llama ├── data │ └── alpacagpt4 ├── finetuning_data │ ├── train.json │ └── val.json ├── smollm2_custom_finetune │ └── logs ├── smollm2_finetune │ ├── logs │ ├── step-001000 │ ├── step-002000 │ ├── step-003000 │ ├── step-004000 │ └── step-005000 ├── smollm2_wmt_eval │ ├── config.json │ ├── generation_config.json │ ├── model_config.yaml │ ├── pytorch_model.bin │ ├── results.json │ ├── tokenizer_config.json │ └── tokenizer.json ├── evaluate.ipynb ├── finetuning_custom_data.ipynb ├── finetuning.ipynb ├── inference_pretrained.ipynb └── prepare_custom_dataset.ipynb
- The
checkpointsdirectory contains the pretrained models that get downloaded from LitGPT. - The
dataandfinetuning_datacontain the predefined LitGPT dataset and the custom dataset, respectively. smollm2_custom_finetunecontains the model fine-tuned on the custom dataset, andsmollm2_finetunecontains the model fine-tuned on one of the predefined LitGPT datasets.- There are five Jupyter Notebooks directly inside the project directory. We will cover the necessary ones individually.
All the Jupyter Notebooks, custom dataset, and custom fine-tuned models are available via the download section.
Download Code
Inference Using Pretrained Model with LitGPT
We will start with a simple inference experiment using one of the pretrained models.
The code for this is present in the inference_pretrained.ipynb notebook.
Before running inference, let’s check all the models that are available for downloading.
# List all models available to download. !litgpt download list
This lists all the pretrained models available in the library. Here is the truncated output.
Please specify --repo_id <repo_id>. Available values: allenai/OLMo-1B-hf allenai/OLMo-7B-hf allenai/OLMo-7B-Instruct-hf BSC-LT/salamandra-2b BSC-LT/salamandra-2b-instruct BSC-LT/salamandra-7b BSC-LT/salamandra-7b-instruct codellama/CodeLlama-13b-hf codellama/CodeLlama-13b-Instruct-hf codellama/CodeLlama-13b-Python-hf codellama/CodeLlama-34b-hf . . . togethercomputer/LLaMA-2-7B-32K Trelis/Llama-2-7b-chat-hf-function-calling-v2 unsloth/Mistral-7B-v0.2
To run inference, we just need one import, that is the LLM class.
from litgpt import LLM
model = LLM.load('meta-llama/Llama-3.2-1B-Instruct')
text = model.generate(
'Who are you and what can you do?',
max_new_tokens=1024
)
print(text)
Here, we load the LLama-3.2 1B instruct model and call the model’s generate method for inference. We provide the prompt and the number of tokens to generate.
The following is a sample output.
Nice to meet you! I'm a conversational AI, which means I'm a computer program designed to simulate conversations and answer questions to the best of my ability. My primary function is to assist and communicate effectively with users like you, providing helpful and relevant information, answering questions, and engaging in discussions. Here are some things I can do: 1. **Answer questions**: I can process natural language queries and provide accurate and informative responses...
We can also run the generation in a streaming manner and output the text as they are generated.
text = model.generate(
'Can we talk about animated videos?',
stream=True,
max_new_tokens=1024
)
for resulting_text in text:
print(resulting_text, end='', flush=True)
Here, we provide an additional stream=True argument and keep printing the text in a streaming manner. Following is a small example of what this looks like.
You can choose any of the models from the list and start experimenting.
Fine-Tuning using LitGPT Predefined Dataset
Now, we will move to fine-tuning a small language model on one of the datasets that comes packaged with the LitGPT library. We will fine-tune the SmolLM2-135M Instruct model.
The code for this resides in the finetuning.ipynb Jupyter Notebook.
The notebook covers inference on a simple question before we start the fine-tuning process. This will help us understand whether the model improved after fine-tuning.
from litgpt import LLM
model = LLM.load('HuggingFaceTB/SmolLM2-135M-Instruct')
text = model.generate(
'Can we talk about animated videos?',
stream=True,
max_new_tokens=1024
)
for resulting_text in text:
print(resulting_text, end='', flush=True)
We are asking the model a simple question about animated videos here. The model gives the following response.
Absolutely! I'd be happy to tailor my answer for you. Let's talk about animated videos. Animated videos often involve animations and animations, which are can be created using different techniques and art styles while adhering to existing templates and algorithms. Scalable animated videos, also known as screen-shot videos or video one thousand hours (VOH), are those created from AI tools. They are created using a variety of techniques believed to mimic the natural motion of an element created during filming. These are known as "manipulation time-lapses." For a scalable animation tool to produce an animated video, these tools are typically used after all compositing is done. Animations are generated from the necessary physics and other physics equations during that time. Then, these are fed into a machine learning algorithm that creates the static animations we often see on screen. Summary: While it's true that animated videos can be created using different techniques and algorithms, the dispute is over how they are created. With the purpose of creating a scalable animatronic, it is usually generated from a scripted AI tool. Dave's Cloud supplies, on behalf of Hugging Face, gets all this.
Because we are using a small language model, although the answer seems good, we have an unnecessary summary at the end. Let’s try to improve that by training it on GPT4-style prompts.
We will use the Alpaca-GPT4 dataset that contains instruction samples generated by GPT4. This can be a good starting point to align our model more towards better responses.
Fine-Tuning SmolLM2-135 Instruct on Alpaca-GPT4 using LitGPT
Fine-tuning using LitGPT is just a single command that can also be run via the terminal. Here, we are executing it in the Jupyter Notebook.
# Fine-tune SmolLM2 on Alpaca-GPT4.
!litgpt finetune_full HuggingFaceTB/SmolLM2-135M-Instruct \
--data AlpacaGPT4 \
--out_dir smollm2_finetune \
--precision "bf16-true" \
--train.save_interval 1000 \
--train.log_interval 500 \
--train.micro_batch_size 4 \
--train.epochs 1 \
--train.max_seq_length 1024 \
Here we are using the finetune_full script that fine-tunes the entire model. LitGPT also supports LoRA and adapter training, which you can find here.
Arguments used:
- The very first argument is the model. We use one of the models that we listed earlier via
litgpt download listcommand. - Next comes the dataset. As we are using a predefined dataset from the library, we just pass the model name to the
--dataargument. - The
--out_dirargument defines the directory where the resulting model will be saved. - As the training was done on an RTX GPU, we are providing the
--precisionas"bf16-true". You can omit this argument if you are not sure whether your GPU supports BF16 or not. --train.save_intervaldefines after how many backpropagation steps the model will be saved after. For us, it is 1000.- We are logging the train and validation loss after every 500 steps using
--train.log_interval. --train.micro_batch_sizedefines the batch size. For us, that is 4. By default, the global batch size is 16. So, there are a total of 4 gradient accumulation steps. Backpropagation will help after 4 such steps.- We are training for 1 epoch and have set the maximum sequence length to 1024.
Let’s take a look at the outputs.
Seed set to 1337 Number of trainable parameters: 162,826,560 The longest sequence length in the train data is 769, the model's maximum sequence length is 769 and context length is 8192 Verifying settings ... Epoch 1 | iter 500 step 125 | loss train: 1.455, val: n/a | iter time: 123.73 ms (step) Epoch 1 | iter 1000 step 250 | loss train: 1.523, val: n/a | iter time: 85.16 ms (step) Epoch 1 | iter 1500 step 375 | loss train: 1.657, val: n/a | iter time: 95.80 ms (step) Epoch 1 | iter 2000 step 500 | loss train: 1.728, val: n/a | iter time: 90.64 ms (step) Validating ... Come up with 3 interesting facts about honeybees. Below is an instruction that describes a task. Write a response that appropriately completes the request. ### Instruction: Come up with 3 interesting facts about honeybees. ### Response: 1. Honey bees are known for their ability to learn the behavior of various food sources and their ability to recognize and distinguish between different varieties. 2. Honey bees can fly up to 12 kilometers (7 miles) in a matter of minutes even while traveling from one flower or plant to another. 3. Honey bees are able to obtain up to 90% of their energy from nectar, which they use to build and forage for themselves. They are also known for their ability to iter 2400: val loss 1.5341, val time: 5309.79 ms Epoch 1 | iter 2500 step 625 | loss train: 1.427, val: 1.534 | iter time: 102.72 ms (step) Epoch 1 | iter 3000 step 750 | loss train: 1.414, val: 1.534 | iter time: 103.74 ms (step) Epoch 1 | iter 3500 step 875 | loss train: 1.442, val: 1.534 | iter time: 94.22 ms (step) Epoch 1 | iter 4000 step 1000 | loss train: 1.511, val: 1.534 | iter time: 83.28 ms (step) Saving checkpoint to 'smollm2_finetune/step-001000' Epoch 1 | iter 4500 step 1125 | loss train: 1.516, val: 1.534 | iter time: 96.33 ms (step) . . . iter 9600: val loss 1.3445, val time: 5518.19 ms Epoch 1 | iter 10000 step 2500 | loss train: 1.251, val: 1.344 | iter time: 93.27 ms (step) Epoch 1 | iter 10500 step 2625 | loss train: 1.460, val: 1.344 | iter time: 87.83 ms (step) Epoch 1 | iter 11000 step 2750 | loss train: 1.318, val: 1.344 | iter time: 98.34 ms (step) Epoch 1 | iter 11500 step 2875 | loss train: 1.525, val: 1.344 | iter time: 105.72 ms (step) Epoch 1 | iter 12000 step 3000 | loss train: 1.192, val: 1.344 | iter time: 90.05 ms (step) Validating ... Come up with 3 interesting facts about honeybees. Below is an instruction that describes a task. Write a response that appropriately completes the request. ### Instruction: Come up with 3 interesting facts about honeybees. ### Response: 1. Honeybees are born as tiny cone-shaped larvae and grow up, to become swarms of maids or drones. While most honeybufs die shortly after laying eggs, their young also survive, growing into bees who can often serve as caretakers, pollinators, and pollinators, and actually making honey honey. 2. Honeybees are one of the most intelligent organisms in the animal kingdom, exhibiting behaviors such as playing, foraging, and foraging for nectar. Honeybee colonies – colonies iter 12000: val loss 1.3416, val time: 5511.72 ms Saving checkpoint to 'smollm2_finetune/step-003000' Epoch 1 | iter 12500 step 3125 | loss train: 1.430, val: 1.342 | iter time: 127.17 ms (step) | ------------------------------------------------------ | Token Counts | - Input Tokens : 7840613 | - Tokens w/ Prompt : 9921786 | - Total Tokens (w/ Padding) : 17134448 | ----------------------------------------------------- | Performance | - Training Time : 1156.97 s | - Tok/sec : 14809.72 tok/s | ----------------------------------------------------- | Memory Usage | - Memory Used : 5.97 GB ------------------------------------------------------- Validating ... Final evaluation | val loss: 1.322 | val ppl: 3.749
At the end, we have a validation loss of 1.322.
Inference After Fine-Tuning
Let’s run inference using the final saved model. For this, we will use the litgpt chat command and execute it in the terminal. This is necessary because the fine-tuned model adheres to a certain prompt format (Alpaca style) that gets correctly loaded via this command. Directly inferencing using model.generate causes the model to give the wrong output.
litgpt chat smollm2_finetune/final/ --max_new_tokens 1024
We tell the script to generate 1024 maximum tokens.
We give exactly the same prompt as before fine-tuning. Here is a small chat session
This time, the answer seems much better.
Fine-Tuning a LitGPT Model on a Custom Dataset
Now, we will move on to fine-tuning a model on a custom dataset. All fine-tuning using LitGPT happens using the Alpaca dataset format as shown below.
[
{
"instruction": "Write a limerick about a
pelican.”,
"input": "",
"output": "There once was a pelican so fine,
\nHis beak was as colorful as
sunshine,\nHe would fish all day,\nIn
a very unique way,\nThis pelican was
truly divine!\n\n\n"
},
{
"instruction": "Identify the odd one out from
the group.",
"input": "Carrot, Apple, Banana, Grape",
"output": "Carrot\n\n"
},
]
Now, we will be fine-tuning the SmolLM2-135M Instruct model for German-to-English translation.
Preparing Custom Dataset For LitGPT Fine-Tuning
The first step for us is to prepare the custom dataset in the Alpaca instruction format.
We will use the German to English translation subset of the WMT16 dataset from Hugging Face. It contains 4.55 million training, 2170 validation, and around 3000 validation samples. However, we will only use 50000 samples for training.
The dataset preparation code is in the prepare_custom_dataset.ipynb Jupyter Notebook. Let’s go through that.
from datasets import load_dataset from tqdm.auto import tqdm import json import os
We load the dataset from the Hugging Face datasets library.
raw_dataset = load_dataset(
'wmt/wmt16',
'de-en'
)
Next, isolate the training and validation samples.
train_dataset = raw_dataset['train'] valid_dataset = raw_dataset['validation']
Create a helper function to generate the custom dataset format.
def convert_data(orig_data, num_samples=None):
json_list = []
for i, data in tqdm(enumerate(orig_data), total=len(orig_data)):
if num_samples and i == num_samples:
break
de = data['translation']['de']
en = data['translation']['en']
sample = {
'instruction': f"Translate from German to English: {de}",
'input': '',
'output': en
}
json_list.append(sample)
return json_list
Finally, create the JSON data and save to the finetuning_data directory.
train_json_data = convert_data(train_dataset, num_samples=50000)
valid_json_data = convert_data(valid_dataset)
os.makedirs('finetuning_data', exist_ok=True)
with open('finetuning_data/train.json', 'w') as f:
json.dump(train_json_data, f)
with open('finetuning_data/val.json', 'w') as f:
json.dump(valid_json_data, f)
This completes the dataset preparation.
Fine-Tuning SmolLM2 on Custom Data
The code for fine-tuning the SmolLM2-135M Instruct model is present in the finetuning_custom_data.ipynb Jupyter Notebook. Let’s go through the code.
Before fine-tuning, let’s check what kind of translation the pretrained model can carry out.
# Check tranlation quality before fine-tuning.
# From German to English.
from litgpt import LLM
model = LLM.load('HuggingFaceTB/SmolLM2-135M-Instruct')
# The English translation is:
# What are animated videos? Let's talk about them.
text = model.generate(
'Translate from German to English: Was sind animierte Videos? Lassen Sie uns darüber sprechen.',
stream=True,
max_new_tokens=1024
)
for resulting_text in text:
print(resulting_text, end='', flush=True)
The following block shows the result.
German: Were sich animierter Videos? Unterlagen wohin Sie darauf sprechen können.
It clearly is not capable of translating the text at the moment.
We will now fine-tune the model.
# Fine-tune SmolLM2 on Alpaca-GPT4.
!litgpt finetune_full HuggingFaceTB/SmolLM2-135M-Instruct \
--data JSON \
--data.json_path finetuning_data \
--out_dir smollm2_custom_finetune \
--precision "bf16-true" \
--train.save_interval 1000 \
--train.log_interval 500 \
--train.global_batch_size 16 \
--train.micro_batch_size 4 \
--train.epochs 3 \
--train.max_seq_length 1024 \
--eval.interval 500 \
--eval.evaluate_example "first"
We use almost similar arguments as our previous training experiments, with a few changes.
--data JSONtells the training script that we are using a JSON format dataset.--data.json_pathargument either accepts a single JSON file or a directory containingtrain.jsonandval.json. For us, it is the latter. If we provide a path to a single JSON file, then we have to provide a validation split, or a default split ratio will be used. However, we already have a validation set.--eval.evaluate_example "first"tells the training script to use the first sample from the validation set to evaluate the model in certain intervals.
Following is the truncated output from the training.
Seed set to 1337 Number of trainable parameters: 162,826,560 The longest sequence length in the train data is 1024, the model's maximum sequence length is 1024 and context length is 8192 Verifying settings ... Epoch 1 | iter 500 step 125 | loss train: 1.908, val: n/a | iter time: 82.11 ms (step) Epoch 1 | iter 1000 step 250 | loss train: 1.887, val: n/a | iter time: 83.52 ms (step) Epoch 1 | iter 1500 step 375 | loss train: 1.786, val: n/a | iter time: 83.71 ms (step) Epoch 1 | iter 2000 step 500 | loss train: 1.773, val: n/a | iter time: 81.81 ms (step) Validating ... Translate from German to English: Die Premierminister Indiens und Japans trafen sich in Tokio. Below is an instruction that describes a task. Write a response that appropriately completes the request. ### Instruction: Translate from German to English: Die Premierminister Indiens und Japans trafen sich in Tokio. ### Response: Ladies and gentlemen, the Prime Minister, Mr Indiens-Vertredem, and Mr Japens, the Prime Minister, Mr Vertredem, were in Tao at the moment. iter 2000: val loss 2.3086, val time: 3744.57 ms Epoch 1 | iter 2500 step 625 | loss train: 1.512, val: 2.309 | iter time: 81.61 ms (step) Epoch 1 | iter 3000 step 750 | loss train: 1.562, val: 2.309 | iter time: 83.08 ms (step) Epoch 1 | iter 3500 step 875 | loss train: 1.488, val: 2.309 | iter time: 84.55 ms (step) Epoch 1 | iter 4000 step 1000 | loss train: 1.780, val: 2.309 | iter time: 84.65 ms (step) Validating ... Translate from German to English: Die Premierminister Indiens und Japans trafen sich in Tokio. Below is an instruction that describes a task. Write a response that appropriately completes the request. ### Instruction: Translate from German to English: Die Premierminister Indiens und Japans trafen sich in Tokio. ### Response: Young Indieners and Japan are in Tokio. iter 4000: val loss 2.2462, val time: 2996.63 ms Saving checkpoint to 'smollm2_custom_finetune/step-001000' Epoch 1 | iter 4500 step 1125 | loss train: 1.434, val: 2.246 | iter time: 82.46 ms (step) Epoch 1 | iter 5000 step 1250 | loss train: 1.533, val: 2.246 | iter time: 83.38 ms (step) Epoch 1 | iter 5500 step 1375 | loss train: 1.478, val: 2.246 | iter time: 84.04 ms (step) Epoch 1 | iter 6000 step 1500 | loss train: 1.545, val: 2.246 | iter time: 84.01 ms (step) . . . iter 34000: val loss 2.2546, val time: 3041.75 ms Epoch 3 | iter 34500 step 8625 | loss train: 0.908, val: 2.255 | iter time: 88.69 ms (step) Epoch 3 | iter 35000 step 8750 | loss train: 0.932, val: 2.255 | iter time: 86.94 ms (step) Epoch 3 | iter 35500 step 8875 | loss train: 0.893, val: 2.255 | iter time: 83.31 ms (step) Epoch 3 | iter 36000 step 9000 | loss train: 0.962, val: 2.255 | iter time: 85.10 ms (step) Validating ... Translate from German to English: Die Premierminister Indiens und Japans trafen sich in Tokio. Below is an instruction that describes a task. Write a response that appropriately completes the request. ### Instruction: Translate from German to English: Die Premierminister Indiens und Japans trafen sich in Tokio. ### Response: The Prime Ministers of India and Japan are presiding over the meeting in Tokyo. iter 36000: val loss 2.2539, val time: 3058.45 ms Saving checkpoint to 'smollm2_custom_finetune/step-009000' Epoch 3 | iter 36500 step 9125 | loss train: 0.906, val: 2.254 | iter time: 83.75 ms (step) Epoch 3 | iter 37000 step 9250 | loss train: 0.921, val: 2.254 | iter time: 82.49 ms (step) Epoch 3 | iter 37500 step 9375 | loss train: 0.934, val: 2.254 | iter time: 83.13 ms (step) | ------------------------------------------------------ | Token Counts | - Input Tokens : 15158385 | - Tokens w/ Prompt : 19657911 | - Total Tokens (w/ Padding) : 28767596 | ----------------------------------------------------- | Performance | - Training Time : 2914.75 s | - Tok/sec : 9869.65 tok/s | ----------------------------------------------------- | Memory Usage | - Memory Used : 7.44 GB ------------------------------------------------------- Validating ... Final evaluation | val loss: 2.230 | val ppl: 9.296
Running Inference using the Custom Dataset Fine-Tuned Model
We will use the final saved model for inference using the terminal chat command.
litgpt chat smollm2_custom_finetune/final/ --max_new_tokens 1024
Here is that chat session.
It seems our model needs much more training before it can correctly translate from German to English. The translation is only partially correct now. We will explore more advanced applications for training and inference in future posts.
Summary and Conclusion
We covered the basics of LitGPT in this article. Starting from inference using pretrained model, fine-tuning on predefined dataset, to fine-tuning with a custom dataset, we covered a lot of concepts. In future articles, we will cover better fine-tuning strategies.
If you have any questions, thoughts, or suggestions, please leave them in the comment section. I will surely address them.
You can contact me using the Contact section. You can also find me on LinkedIn, and X.





