Articles

Fine-Tuning Large Language Models (LLMs) in Python

Learn how to fine-tune large language models (LLMs) in Python using various methods and apply them to an open-source LLM for sentiment analysis.

With the popularity of transformer models and platforms like HuggingFace, pre-trained large language models have become accessible to anyone with an internet connection. We can download any open-source large language model (LLM) from platforms like HuggingFace Hub and use it to build custom LLM applications. However, the open-source models might not be the most suitable for our use case. We need to fine-tune the LLMs using a custom dataset to make them suitable for our business use case.

In this article, we will explore how to fine-tune an open-source LLM using supervised fine-tuning for sentiment analysis. We will discuss the basics of LLM fine-tuning, the different ways to fine-tune LLMs, and how to save the fine-tuned LLMs locally and on the HuggingFace Hub. Finally, we will discuss how to access the fine-tuned model after pushing it to HuggingFace Hub.

What is LLM fine-tuning?

LLM fine-tuning is the process of taking a pre-trained LLM and training it on a specific dataset to improve its performance for a particular use case. This approach is cost-effective, as it adapts an existing model to a new domain without the need for extensive computational resources. Instead of building and training a model from scratch, an open-source model can be fine-tuned with a custom dataset to achieve the desired results.

We can understand the LLM fine-tuning process using the following diagram:

LLM fine-tuning process

Here, you can see that we take a pre-trained LLM with a dataset and fine-tune the model. After fine-tuning, we get a fine-tuned LLM for our use case, specifically trained on our custom dataset.

Related Course

Finetuning Transformer Models

Master the art of LLM finetuning with LoRA, QLoRA, and Hugging Face. Learn how to prepare, train and optimize models for specific tasks efficiently. Try it for free

Why do we fine-tune LLMs?

Pre-trained LLMs like GPT, GPT-2, GPT-4, LLaMA, or BERT are trained on massive datasets and excel at general tasks. However, training these models from scratch requires huge computational resources and monetary expenses, often beyond the capabilities of individuals and small organizations. Also, these LLMs aren’t domain or task-specific. Fine-tuning these pre-trained LLMs helps us in the following ways:

  • Saving costs: Instead of training an LLM from scratch, fine-tuning allows leveraging pre-trained models for specialized tasks at a fraction of the cost. Fine-tuning requires significantly fewer computational resources, making it a more accessible option.
  • Improve model performance for specific domains: Fine-tuning also allows us to improve an LLM’s accuracy, efficiency, and reliability by training it on specialized data. For example, if we want to build an application for the legal domain, we can train an LLM like BERT on legal documents, historical case documents, and laws. After this, the fine-tuned model will understand legal jargon and context. We can then use the fine-tuned model to assist the lawyers or judges.
  • Improve model performance for specific use cases: In addition to domain-specific fine-tuning, we can also fine-tune LLMs for specific use cases like sentiment analysis, text summarization, or question answering. For instance, if we want to create an LLM application for a chatbot, we can fine-tune the model for question-answering tasks.
  • Adapt LLMs for different languages: LLMs are primarily trained on datasets that are predominantly in the English language. Due to this, LLMs often struggle with multilingual tasks. We can fine-tune the pre-trained LLMs to support other languages with a low amount of available training materials. Fine-tuning LLMs for different languages will help use LLMs in specific geographical areas as the fine-tuned model can handle the user’s language accurately.
  • Align models to human values and preferences: Due to the inherent bias in the training data, LLMs can generate biased, harmful, or irrelevant responses. We can fine-tune the LLMs to improve the response quality and align the model’s behavior with human values to enhance user experience.

Now that we understand the reasons for fine-tuning an LLM, let’s discuss the different ways to fine-tune a large language model.

Different ways to fine-tune LLMs

LLM fine-tuning involves adjusting the weights of a pre-trained LLM by training it on a labeled dataset. Depending on the available computational resources, the level of customization needed, and the task requirements, we can fine-tune LLMs using techniques like supervised fine-tuning, reinforcement learning with human feedback, low-rank adaptation, etc. Let’s discuss these methods individually.

Supervised fine-tuning (SFT)

Supervised Fine-Tuning (SFT) is the most standard technique for fine-tuning LLMs. In SFT, we take a pre-trained LLM with original parameters and train it further on a labeled dataset specific to our use case such as sentiment analysis, document classification, or question-answering, using supervised learning. After fine-tuning, the LLM’s parameters are fully updated for the specific use cases.

SFT helps us build fine-tuned LLMs with high accuracy for domain-specific tasks. However, it requires large, labeled datasets and has a high computational cost, as all the model parameters are updated. To reduce the training costs, we can use parameter-efficient fine-tuning methods.

Parameter efficient fine-tuning (PEFT)

Parameter Efficient Fine-Tuning (PEFT) includes LLM fine-tuning methods that fine-tune only a small subset of parameters instead of the entire model. In PEFT, we can fine-tune LLMs using one of the following ways:

  • Add additional layers to the model: We can freeze the parameters of the pre-trained models and fine-tune them by adding additional layers. In this case, only the newly added layers are trained while fine-tuning, and we get a fine-tuned model specific to our use case. Low-Rank Adaptation (LoRA) is one of the techniques that we can use to fine-tune LLMs by adding new layers.
  • Prefix tuning: In prefix tuning, we append special trainable tokens to each layer of the LLM instead of modifying the model weights. While fine-tuning, the newly added tokens are trained to adapt to the training data instead of changing the model parameters. Thus, prefix-tuning allows us to fine-tune LLMs by encoding task-specific information in a minimal number of tokens and updating only the attention layers in the model.
  • Prompt tuning: In prompt tuning, we fine-tune the LLMs by introducing trainable embeddings called soft prompts to the input before passing it to the model. The LLM learns a small set of task-specific trainable prompt embeddings and uses them for the tasks.

Reinforcement learning with human feedback (RLHF)

Reinforcement learning with human feedback (RLHF) is an LLM fine-tuning technique that we use to align LLMs with human preferences. RLHF is used in ChatGPT, Gemini, and other conversational AI models to make responses more helpful and aligned with human values.

RLHF consists of three main steps:

  1. Fine-tune a pre-trained model using SFT on human-annotated responses.
  2. Train a reward model that ranks the responses of the fine-tuned LLM.
  3. Use reinforcement learning to optimize the model based on the reward model’s rankings.

You can use any of these methods to fine-tune LLMs based on the use case and available resources. Now, let’s fine-tune an LLM for a sentiment analysis task using SFT.

Fine-tuning an LLM model for the sentiment analysis task

We will fine-tune the DistilBERT base model using the IMDB reviews dataset for sentiment analysis task. To do this, we will use the supervised fine-tuning technique. To begin with, let’s first install the necessary modules for fine-tuning.

Step 1- Setting up the dev environment

To fine-tune the LLM, we need the following modules:

  • The datasets module to download the dataset.
  • The evaluate module for model evaluation.
  • The huggingface-hub module to download the pre-trained DistilBERT model.
  • The trl module for using SFTTrainer for supervised fine-tuning.
  • The huggingface-cli module to sign in to Hugging Face Hub using the command-line terminal.

You can install all these modules by executing the following command in the command-line terminal:

pip install datasets evaluate transformers transformers[torch] huggingface-hub huggingface-cli trl

Next, we need to generate a HuggingFace token with write access from the HuggingFace tokens page to download pre-trained models.

After generating the token, we can log in to HuggingFace Hub using the login() function. The login() function defined in the huggingface_hub module takes the huggingface token as input to its token parameter.

import huggingface_hub
huggingface_hub.login(token='your_HuggingFace_token')

Alternatively, we can execute the huggingface-cli command in the command-line terminal and input the token to initialize a session to access models and datasets from HuggingFace Hub. After logging in, we can access and download the pre-trained LLMs from the HuggingFace model repository.

Step 2 - Download the pre-trained LLM

Next, we will download and fine-tune the DistilBERT base model for the sentiment analysis task. To do this, we will use the AutoModelForSequenceClassification wrapper defined in the transformers module.

  • AutoModelForSequenceClassification is a wrapper model from the transformers library specially designed for sequence classification tasks like sentiment analysis, spam detection, and topic classification. This wrapper automatically loads the appropriate model architecture (e.g., BERT, DistilBERT, RoBERTa) based on the specified model.
  • To download the model, we will pass the pre-trained model name distilbert-base-uncased to the from_pretrained() method by invoking it on the AutoModelForSequenceClassification wrapper.

After execution, the from_pretrained() method returns the specified pre-trained model. We will assign this model to the pretrained_model variable.

from transformers import AutoModelForSequenceClassification
pretrained_model_name = "distilbert-base-uncased"
pretrained_model = AutoModelForSequenceClassification.from_pretrained(pretrained_model_name)

After executing the from_pretrained() function, we also get this text output that suggests we can train this model to use it for predictions and inference.

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a downstream task to be able to use it for predictions and inference.

Now that we have the pre-trained model, let’s download the dataset.

Step 3- Download the IMDB dataset

To download the dataset, we will use the load_dataset() function defined in the datasets module. The load_dataset() function takes the dataset name as its first input argument. To load the training set, we can pass the string “train” as input to the split parameter. To load the test dataset, we can pass the string “test” as input to the split parameter, as shown in the following example:

from datasets import load_dataset
training_data = load_dataset("imdb",split="train")
test_data = load_dataset("imdb",split="test")
print(training_data)
print(test_data)

Output:

Training data: Dataset({
features: ['text', 'label'],
num_rows: 25000
})
Test data: Dataset({
features: ['text', 'label'],
num_rows: 25000
})

In the output, observe that the training and the test data contain two columns, namely text and label. The text column contains reviews from IMDB. The label column contains the values 0 and 1, which specify the sentiment for the reviews, as shown in the following example:

# Get the first row in the training data
training_text=training_data['text'][0]
# Get the first label in the training data
training_label=training_data['label'][0]
# Get the first row in the test data
test_text=test_data['text'][0]
# Get the first label in the test data
test_label=test_data['label'][0]
print("The first review in the training data:", training_text)
print("The sentiment label for the first review in the training data:", training_label)
print("The first review in the text data:", training_text)
print("The sentiment label for the first review in the test data:", test_label)

Output:

The first review in the training data: I rented I AM CURIOUS-YELLOW from my video store because of all the controversy that surrounded it when it was first released in 1967. I also heard that at first, it was seized by U.S. customs if it ever tried to enter this......
The sentiment label for the first review in the training data: 0
The first review in the text data: I rented I AM CURIOUS-YELLOW from my video store because of all the controversy that surrounded it when it was first released in 1967. I also heard that at first, it was seized by U.S.......
The sentiment label for the first review in the test data: 0

Having downloaded the pre-trained model and the dataset, let’s create a tokenizer and a data collator to pre-process the dataset and make it suitable for the pre-trained model.

Step 4 - Create a tokenizer and a data collator

We need a tokenizer that will tokenize the data in the same format as the dataset used to train the pre-trained model. To create the tokenizer, we will use the AutoTokenizer class. The from_pretrained() method in the AutoTokenizer class takes the pre-trained model name as input and automatically loads the correct tokenizer for the pre-trained model.

from transformers import AutoTokenizer
pretrained_model_name = "distilbert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name)

After getting the tokenizer, let’s write a function that takes a row from the dataset as its input and tokenizes the text in the input row.

def tokenize(record):
outputs = tokenizer(record['text'], truncation=True, padding="max_length", max_length=512)
return outputs

In this code,

  • The parameter record represents a row in the dataset and text from the record is the input that we want to tokenize.
  • Setting the truncation parameter to True ensures that if the text is longer than the maximum length allowed by the tokenizer (max_length = 512 in this case), it will be truncated to fit within that limit.
  • The padding parameter is set to max_length to pad the tokenized output to exactly max_length tokens.
  • We have set the max_length parameter to 512, as transformer models like BERT and DistilBERT have a maximum context length of 512 tokens.

After implementing the tokenizer, let’s also implement a data collator that dynamically pads the tokenized inputs to the longest sequence in a batch. This ensures that all sequences are of the same length, allowing efficient batch processing in fine-tuning LLMs.

To implement the data collator, we will use the DataCollatorWithPadding() function that takes the tokenizer as its input and returns a data collator.

from transformers import DataCollatorWithPadding
data_collator = DataCollatorWithPadding(tokenizer)

Now that we have created the tokenizer and the data collator, let’s pre-process the data.

Step 5- Pre-process the dataset

For pre-processing the data, we will apply the tokenize() function on the text column in the dataset using the map() method.

tokenized_training_data = training_data.map(tokenize, batched=True)
tokenized_test_data = test_data.map(tokenize, batched=True)
print("Tokenized training data:",tokenized_training_data)
print("Tokenized test data:",tokenized_test_data)

After tokenization, we get a dataset with two additional columns i.e. ‘input_ids’ and ‘attention_mask’, as shown in the output.

Tokenized training data: Dataset({
features: ['text', 'label', 'input_ids', 'attention_mask'],
num_rows: 25000
})
Tokenized test data: Dataset({
features: ['text', 'label', 'input_ids', 'attention_mask'],
num_rows: 25000
})

The input_ids feature contains a vector representing the tokenized format, i.e., the numerical representation of the text from the text column of the dataset. The attention_mask feature contains a binary mask for each token in the input_ids feature that tells the model which tokens to pay attention to. The attention_mask feature prevents the model from processing padding tokens without meaning.

# Get the tokenized_version of the first row in the training data
training_input_id=tokenized_training_data['input_ids'][0]
# Get the attention mask of the first row in the training data
training_attention_mask=tokenized_training_data['attention_mask'][0]
# Get the tokenized_version of the first row in the text data
test_input_id=tokenized_test_data['input_ids'][0]
# Get the attention mask of the first row in the text data
test_attention_mask=tokenized_test_data['attention_mask'][0]
print("The tokenized version of the first review in the training data:", training_input_id)
print("The attention mask for the first review in the training data:", training_attention_mask)
print("The tokenized version of the first review in the test data:", test_input_id)
print("The attention mask for the first review in the test data:", test_attention_mask)

Output:

The tokenized version of the first review in the training data: [101, 1045, 12524, 1045, 2572, 8025, 1011, 3756, 2013, 2026, 2678, 3573, 2138,5129, 2009, 2043, 2009,...... 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
The attention mask for the first review in the training data: [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,.....0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
The tokenized version of the first review in the test data: [101, 1045, 2293, 16596, 1011, 10882, 1998, 2572, 5627, 2000, 2404,......, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
The attention mask for the first review in the test data: [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,......, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

Next, we will add a function to evaluate the model’s performance while training.

Step 6- Create a function to evaluate the model

To evaluate the model, we’ll write a function that computes its accuracy from the predictions and labels in the training data.

import numpy as np
import evaluate
def compute_metrics(predictions_and_labels):
metric = evaluate.load("accuracy")
logits, labels = predictions_and_labels
predictions = np.argmax(logits, axis=-1)
return metric.compute(predictions=predictions, references=labels)

Having implemented the function to evaluate the model performance, let’s create a supervised model trainer to fine-tune the pre-trained LLM using the SFTTrainer() function.

Step 7- Create a supervised model trainer to fine-tune the pre-trained LLM

To create the model trainer, we first need to define the training parameter such as number of epochs while training, output directory for the fine-tuned model, batch size for training and evaluation, and the evaluation strategy. Then , we need to create a model trainer using the tokenizer, data collator, pretrained model, training dataset, test dataset, and other training parameters.

To define the training arguments, we will use the TrainingArguments() function defined in the transformers module.

  • The TrainingArguments() function takes the number of epochs i.e. num_train_epochs, the directory for saving the fine-tuned model i.e. output_dir, the batch size of training and evaluating the model, and the evaluation strategy as it’s input. We will set the batch size for training and evaluating the models to 16. Also, we will set the eval_strategy parameter to ‘epoch’ so that the model evaluation is done after every epoch.
  • To create the model trainer, we will use the SFTTrainer() function defined in the trl module. The SFTTrainer() function takes the pre-trained model, the tokenizer, the data collator, the training dataset, the test dataset, the function to evaluate the model, and other training arguments as inputs to the model, processing_class, data_collator, train_dataset, eval_dataset, compute_metrics, and the args parameter respectively.

After executing the SFTTrainer() function, we get a trainer object ready for fine-tuning.

from transformers import TrainingArguments
from trl import SFTTrainer
# Create training arguments
training_args = TrainingArguments(num_train_epochs=10,
output_dir="text-classifier-supervised-codecademy",
per_device_train_batch_size=16,
per_device_eval_batch_size=16,
eval_strategy="epoch")
# Create a supervised model trainer for fine-tuning the LLM model
trainer = SFTTrainer(model=pretrained_model,
processing_class=tokenizer,
data_collator=data_collator,
args=training_args,
train_dataset=tokenized_training_data,
eval_dataset=tokenized_test_data,
compute_metrics=compute_metrics)

After creating the trainer, let’s fine-tune the LLM.

Step 8- Train and save the fine-tuned model

To fine-tune the LLM using the SFTTrainer, we will invoke the train() method on the trainer object returned by the SFTTrainer() function.

trainer.train()

After training the model, we can save it to the local machine using the save_model() method. The save_model() method, when invoked on the trainer object, takes the file path for the model as its input and saves the model to the local storage.

file_path_to_save_the_model = '/home/aditya1117/codes/text-classifier-supervised-codecademy'
trainer.save_model(file_path_to_save_the_model)

We can also push the fine-tuned LLM model to the HuggingFace models repository. For this, we need to invoke the push_to_hub() method on the trainer and pass a model name for the fine-tuned model:

model_name = "text-classifier-supervised-codecademy"
trainer.push_to_hub(model_name)

After successful execution, the push_to_hub() function returns a commit message that contains the link to the fine-tuned model in the format https://huggingface.co/<user_name>/<model_name> , as shown in the following output:

CommitInfo(commit_url='https://huggingface.co/raditya1117/text-classifier-supervised-codecademy/commit/5a04ca039661b209c0c4cedaf7f680f10e1945d1', commit_message='text-classifier-supervised-codecademy', commit_description='', oid='5a04ca039661b209c0c4cedaf7f680f10e1945d1', pr_url=None, repo_url=RepoUrl('https://huggingface.co/raditya1117/text-classifier-supervised-codecademy', endpoint='https://huggingface.co', repo_type='model', repo_id='raditya1117/text-classifier-supervised-codecademy'), pr_revision=None, pr_num=None) 

In the output, you can see that our fine-tuned LLM is stored at the address https://huggingface.co/raditya1117/text-classifier-supervised-codecademy. Here, raditya1117 is the username of the HuggingFace account, and text-classifier-supervised-codecademy is the model name.

If you go to the above link, you will see all the details of the model in the model card. You can also update the details to provide more information about the training dataset and other metrics.

Model card for fine-tuned model

Step 9- Load the fine-tuned model

After saving the fine-tuned model, you can use the AutoModelForSequenceClassification.from_pretrained() function to load and use it. To load the fine-tuned model from the HuggingFace repository, you need to pass the model name in the format ‘your_huggingface_username/model_name’, as shown below:

model_name = "raditya1117/text-classifier-supervised-codecademy"
model = AutoModelForSequenceClassification.from_pretrained(model_name)

To load the fine-tuned model from the local storage, you can pass the model’s file path to the from_pretrained() method.

model_filepath = "/home/aditya1117/codes/text-classifier-supervised-codecademy"
model = AutoModelForSequenceClassification.from_pretrained(model_filepath)

After loading the model, you can use it for inference or fine-tune it further to improve its accuracy and performance.

Conclusion

Fine-tuning Large Language Models (LLMs) is a game-changer for customizing AI models to specific tasks like sentiment analysis, chatbots, and text summarization. Instead of training from scratch, fine-tuning adapts powerful pre-trained models for higher accuracy, better efficiency, and domain-specific performance while saving computational resources.

To learn more about using LLMs, you can go through this course on fine-tuning transformer models. You might also like this course on building your own LLM using PyTorch.

Happy learning!

Author

Codecademy Team

'The Codecademy Team, composed of experienced educators and tech experts, is dedicated to making tech skills accessible to all. We empower learners worldwide with expert-reviewed content that develops and enhances the technical skills needed to advance and succeed in their careers.'

Meet the full team