Fine-Tuning Large Language Models (LLMs) in Python
With the popularity of transformer models and platforms like HuggingFace, pre-trained large language models have become accessible to anyone with an internet connection. We can download any open-source large language model (LLM) from platforms like HuggingFace Hub and use it to build custom LLM applications. However, the open-source models might not be the most suitable for our use case. We need to fine-tune the LLMs using a custom dataset to make them suitable for our business use case.
In this article, we will explore how to fine-tune an open-source LLM using supervised fine-tuning for sentiment analysis. We will discuss the basics of LLM fine-tuning, the different ways to fine-tune LLMs, and how to save the fine-tuned LLMs locally and on the HuggingFace Hub. Finally, we will discuss how to access the fine-tuned model after pushing it to HuggingFace Hub.
What is LLM fine-tuning?
LLM fine-tuning is the process of taking a pre-trained LLM and training it on a specific dataset to improve its performance for a particular use case. This approach is cost-effective, as it adapts an existing model to a new domain without the need for extensive computational resources. Instead of building and training a model from scratch, an open-source model can be fine-tuned with a custom dataset to achieve the desired results.
We can understand the LLM fine-tuning process using the following diagram:
Here, you can see that we take a pre-trained LLM with a dataset and fine-tune the model. After fine-tuning, we get a fine-tuned LLM for our use case, specifically trained on our custom dataset.
Finetuning Transformer Models
Master the art of LLM finetuning with LoRA, QLoRA, and Hugging Face. Learn how to prepare, train and optimize models for specific tasks efficiently. Try it for freeWhy do we fine-tune LLMs?
Pre-trained LLMs like GPT, GPT-2, GPT-4, LLaMA, or BERT are trained on massive datasets and excel at general tasks. However, training these models from scratch requires huge computational resources and monetary expenses, often beyond the capabilities of individuals and small organizations. Also, these LLMs aren’t domain or task-specific. Fine-tuning these pre-trained LLMs helps us in the following ways:
- Saving costs: Instead of training an LLM from scratch, fine-tuning allows leveraging pre-trained models for specialized tasks at a fraction of the cost. Fine-tuning requires significantly fewer computational resources, making it a more accessible option.
- Improve model performance for specific domains: Fine-tuning also allows us to improve an LLM’s accuracy, efficiency, and reliability by training it on specialized data. For example, if we want to build an application for the legal domain, we can train an LLM like BERT on legal documents, historical case documents, and laws. After this, the fine-tuned model will understand legal jargon and context. We can then use the fine-tuned model to assist the lawyers or judges.
- Improve model performance for specific use cases: In addition to domain-specific fine-tuning, we can also fine-tune LLMs for specific use cases like sentiment analysis, text summarization, or question answering. For instance, if we want to create an LLM application for a chatbot, we can fine-tune the model for question-answering tasks.
- Adapt LLMs for different languages: LLMs are primarily trained on datasets that are predominantly in the English language. Due to this, LLMs often struggle with multilingual tasks. We can fine-tune the pre-trained LLMs to support other languages with a low amount of available training materials. Fine-tuning LLMs for different languages will help use LLMs in specific geographical areas as the fine-tuned model can handle the user’s language accurately.
- Align models to human values and preferences: Due to the inherent bias in the training data, LLMs can generate biased, harmful, or irrelevant responses. We can fine-tune the LLMs to improve the response quality and align the model’s behavior with human values to enhance user experience.
Now that we understand the reasons for fine-tuning an LLM, let’s discuss the different ways to fine-tune a large language model.
Different ways to fine-tune LLMs
LLM fine-tuning involves adjusting the weights of a pre-trained LLM by training it on a labeled dataset. Depending on the available computational resources, the level of customization needed, and the task requirements, we can fine-tune LLMs using techniques like supervised fine-tuning, reinforcement learning with human feedback, low-rank adaptation, etc. Let’s discuss these methods individually.
Supervised fine-tuning (SFT)
Supervised Fine-Tuning (SFT) is the most standard technique for fine-tuning LLMs. In SFT, we take a pre-trained LLM with original parameters and train it further on a labeled dataset specific to our use case such as sentiment analysis, document classification, or question-answering, using supervised learning. After fine-tuning, the LLM’s parameters are fully updated for the specific use cases.
SFT helps us build fine-tuned LLMs with high accuracy for domain-specific tasks. However, it requires large, labeled datasets and has a high computational cost, as all the model parameters are updated. To reduce the training costs, we can use parameter-efficient fine-tuning methods.
Parameter efficient fine-tuning (PEFT)
Parameter Efficient Fine-Tuning (PEFT) includes LLM fine-tuning methods that fine-tune only a small subset of parameters instead of the entire model. In PEFT, we can fine-tune LLMs using one of the following ways:
- Add additional layers to the model: We can freeze the parameters of the pre-trained models and fine-tune them by adding additional layers. In this case, only the newly added layers are trained while fine-tuning, and we get a fine-tuned model specific to our use case. Low-Rank Adaptation (LoRA) is one of the techniques that we can use to fine-tune LLMs by adding new layers.
- Prefix tuning: In prefix tuning, we append special trainable tokens to each layer of the LLM instead of modifying the model weights. While fine-tuning, the newly added tokens are trained to adapt to the training data instead of changing the model parameters. Thus, prefix-tuning allows us to fine-tune LLMs by encoding task-specific information in a minimal number of tokens and updating only the attention layers in the model.
- Prompt tuning: In prompt tuning, we fine-tune the LLMs by introducing trainable embeddings called soft prompts to the input before passing it to the model. The LLM learns a small set of task-specific trainable prompt embeddings and uses them for the tasks.
Reinforcement learning with human feedback (RLHF)
Reinforcement learning with human feedback (RLHF) is an LLM fine-tuning technique that we use to align LLMs with human preferences. RLHF is used in ChatGPT, Gemini, and other conversational AI models to make responses more helpful and aligned with human values.
RLHF consists of three main steps:
- Fine-tune a pre-trained model using SFT on human-annotated responses.
- Train a reward model that ranks the responses of the fine-tuned LLM.
- Use reinforcement learning to optimize the model based on the reward model’s rankings.
You can use any of these methods to fine-tune LLMs based on the use case and available resources. Now, let’s fine-tune an LLM for a sentiment analysis task using SFT.
Fine-tuning an LLM model for the sentiment analysis task
We will fine-tune the DistilBERT base model using the IMDB reviews dataset for sentiment analysis task. To do this, we will use the supervised fine-tuning technique. To begin with, let’s first install the necessary modules for fine-tuning.
Step 1- Setting up the dev environment
To fine-tune the LLM, we need the following modules:
- The
datasets
module to download the dataset. - The
evaluate
module for model evaluation. - The
huggingface-hub
module to download the pre-trained DistilBERT model. - The
trl
module for usingSFTTrainer
for supervised fine-tuning. - The
huggingface-cli
module to sign in to Hugging Face Hub using the command-line terminal.
You can install all these modules by executing the following command in the command-line terminal:
pip install datasets evaluate transformers transformers[torch] huggingface-hub huggingface-cli trl
Next, we need to generate a HuggingFace token with write access from the HuggingFace tokens page to download pre-trained models.
After generating the token, we can log in to HuggingFace Hub using the login()
function. The login()
function defined in the huggingface_hub
module takes the huggingface token as input to its token
parameter.
import huggingface_hubhuggingface_hub.login(token='your_HuggingFace_token')
Alternatively, we can execute the huggingface-cli
command in the command-line terminal and input the token to initialize a session to access models and datasets from HuggingFace Hub. After logging in, we can access and download the pre-trained LLMs from the HuggingFace model repository.
Step 2 - Download the pre-trained LLM
Next, we will download and fine-tune the DistilBERT base model for the sentiment analysis task. To do this, we will use the AutoModelForSequenceClassification
wrapper defined in the transformers
module.
AutoModelForSequenceClassification
is a wrapper model from the transformers library specially designed for sequence classification tasks like sentiment analysis, spam detection, and topic classification. This wrapper automatically loads the appropriate model architecture (e.g., BERT, DistilBERT, RoBERTa) based on the specified model.- To download the model, we will pass the pre-trained model name
distilbert-base-uncased
to thefrom_pretrained()
method by invoking it on theAutoModelForSequenceClassification
wrapper.
After execution, the from_pretrained()
method returns the specified pre-trained model. We will assign this model to the pretrained_model
variable.
from transformers import AutoModelForSequenceClassificationpretrained_model_name = "distilbert-base-uncased"pretrained_model = AutoModelForSequenceClassification.from_pretrained(pretrained_model_name)
After executing the from_pretrained()
function, we also get this text output that suggests we can train this model to use it for predictions and inference.
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']You should probably TRAIN this model on a downstream task to be able to use it for predictions and inference.
Now that we have the pre-trained model, let’s download the dataset.
Step 3- Download the IMDB dataset
To download the dataset, we will use the load_dataset()
function defined in the datasets
module. The load_dataset()
function takes the dataset name as its first input argument. To load the training set, we can pass the string “train” as input to the split
parameter. To load the test dataset, we can pass the string “test” as input to the split
parameter, as shown in the following example:
from datasets import load_datasettraining_data = load_dataset("imdb",split="train")test_data = load_dataset("imdb",split="test")print(training_data)print(test_data)
Output:
Training data: Dataset({features: ['text', 'label'],num_rows: 25000})Test data: Dataset({features: ['text', 'label'],num_rows: 25000})
In the output, observe that the training and the test data contain two columns, namely text
and label
. The text
column contains reviews from IMDB. The label
column contains the values 0 and 1, which specify the sentiment for the reviews, as shown in the following example:
# Get the first row in the training datatraining_text=training_data['text'][0]# Get the first label in the training datatraining_label=training_data['label'][0]# Get the first row in the test datatest_text=test_data['text'][0]# Get the first label in the test datatest_label=test_data['label'][0]print("The first review in the training data:", training_text)print("The sentiment label for the first review in the training data:", training_label)print("The first review in the text data:", training_text)print("The sentiment label for the first review in the test data:", test_label)
Output:
The first review in the training data: I rented I AM CURIOUS-YELLOW from my video store because of all the controversy that surrounded it when it was first released in 1967. I also heard that at first, it was seized by U.S. customs if it ever tried to enter this......The sentiment label for the first review in the training data: 0The first review in the text data: I rented I AM CURIOUS-YELLOW from my video store because of all the controversy that surrounded it when it was first released in 1967. I also heard that at first, it was seized by U.S.......The sentiment label for the first review in the test data: 0
Having downloaded the pre-trained model and the dataset, let’s create a tokenizer and a data collator to pre-process the dataset and make it suitable for the pre-trained model.
Step 4 - Create a tokenizer and a data collator
We need a tokenizer that will tokenize the data in the same format as the dataset used to train the pre-trained model. To create the tokenizer, we will use the AutoTokenizer
class. The from_pretrained()
method in the AutoTokenizer
class takes the pre-trained model name as input and automatically loads the correct tokenizer for the pre-trained model.
from transformers import AutoTokenizerpretrained_model_name = "distilbert-base-uncased"tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name)
After getting the tokenizer, let’s write a function that takes a row from the dataset as its input and tokenizes the text in the input row.
def tokenize(record):outputs = tokenizer(record['text'], truncation=True, padding="max_length", max_length=512)return outputs
In this code,
- The parameter
record
represents a row in the dataset andtext
from the record is the input that we want to tokenize. - Setting the
truncation
parameter toTrue
ensures that if the text is longer than the maximum length allowed by the tokenizer (max_length
= 512 in this case), it will be truncated to fit within that limit. - The
padding
parameter is set tomax_length
to pad the tokenized output to exactlymax_length
tokens. - We have set the
max_length
parameter to 512, as transformer models like BERT and DistilBERT have a maximum context length of 512 tokens.
After implementing the tokenizer, let’s also implement a data collator that dynamically pads the tokenized inputs to the longest sequence in a batch. This ensures that all sequences are of the same length, allowing efficient batch processing in fine-tuning LLMs.
To implement the data collator, we will use the DataCollatorWithPadding()
function that takes the tokenizer as its input and returns a data collator.
from transformers import DataCollatorWithPaddingdata_collator = DataCollatorWithPadding(tokenizer)
Now that we have created the tokenizer and the data collator, let’s pre-process the data.
Step 5- Pre-process the dataset
For pre-processing the data, we will apply the tokenize()
function on the text
column in the dataset using the map()
method.
tokenized_training_data = training_data.map(tokenize, batched=True)tokenized_test_data = test_data.map(tokenize, batched=True)print("Tokenized training data:",tokenized_training_data)print("Tokenized test data:",tokenized_test_data)
After tokenization, we get a dataset with two additional columns i.e. ‘input_ids’ and ‘attention_mask’, as shown in the output.
Tokenized training data: Dataset({features: ['text', 'label', 'input_ids', 'attention_mask'],num_rows: 25000})Tokenized test data: Dataset({features: ['text', 'label', 'input_ids', 'attention_mask'],num_rows: 25000})
The input_ids
feature contains a vector representing the tokenized format, i.e., the numerical representation of the text from the text
column of the dataset. The attention_mask
feature contains a binary mask for each token in the input_ids
feature that tells the model which tokens to pay attention to. The attention_mask
feature prevents the model from processing padding tokens without meaning.
# Get the tokenized_version of the first row in the training datatraining_input_id=tokenized_training_data['input_ids'][0]# Get the attention mask of the first row in the training datatraining_attention_mask=tokenized_training_data['attention_mask'][0]# Get the tokenized_version of the first row in the text datatest_input_id=tokenized_test_data['input_ids'][0]# Get the attention mask of the first row in the text datatest_attention_mask=tokenized_test_data['attention_mask'][0]print("The tokenized version of the first review in the training data:", training_input_id)print("The attention mask for the first review in the training data:", training_attention_mask)print("The tokenized version of the first review in the test data:", test_input_id)print("The attention mask for the first review in the test data:", test_attention_mask)
Output:
The tokenized version of the first review in the training data: [101, 1045, 12524, 1045, 2572, 8025, 1011, 3756, 2013, 2026, 2678, 3573, 2138,5129, 2009, 2043, 2009,...... 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]The attention mask for the first review in the training data: [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,.....0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]The tokenized version of the first review in the test data: [101, 1045, 2293, 16596, 1011, 10882, 1998, 2572, 5627, 2000, 2404,......, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]The attention mask for the first review in the test data: [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,......, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Next, we will add a function to evaluate the model’s performance while training.
Step 6- Create a function to evaluate the model
To evaluate the model, we’ll write a function that computes its accuracy from the predictions and labels in the training data.
import numpy as npimport evaluatedef compute_metrics(predictions_and_labels):metric = evaluate.load("accuracy")logits, labels = predictions_and_labelspredictions = np.argmax(logits, axis=-1)return metric.compute(predictions=predictions, references=labels)
Having implemented the function to evaluate the model performance, let’s create a supervised model trainer to fine-tune the pre-trained LLM using the SFTTrainer()
function.
Step 7- Create a supervised model trainer to fine-tune the pre-trained LLM
To create the model trainer, we first need to define the training parameter such as number of epochs while training, output directory for the fine-tuned model, batch size for training and evaluation, and the evaluation strategy. Then , we need to create a model trainer using the tokenizer, data collator, pretrained model, training dataset, test dataset, and other training parameters.
To define the training arguments, we will use the TrainingArguments()
function defined in the transformers
module.
- The
TrainingArguments()
function takes the number of epochs i.e.num_train_epochs
, the directory for saving the fine-tuned model i.e.output_dir
, the batch size of training and evaluating the model, and the evaluation strategy as it’s input. We will set the batch size for training and evaluating the models to 16. Also, we will set theeval_strategy
parameter to ‘epoch’ so that the model evaluation is done after every epoch. - To create the model trainer, we will use the
SFTTrainer()
function defined in thetrl
module. TheSFTTrainer()
function takes the pre-trained model, the tokenizer, the data collator, the training dataset, the test dataset, the function to evaluate the model, and other training arguments as inputs to themodel
,processing_class
,data_collator
,train_dataset
,eval_dataset
,compute_metrics
, and theargs
parameter respectively.
After executing the SFTTrainer()
function, we get a trainer object ready for fine-tuning.
from transformers import TrainingArgumentsfrom trl import SFTTrainer# Create training argumentstraining_args = TrainingArguments(num_train_epochs=10,output_dir="text-classifier-supervised-codecademy",per_device_train_batch_size=16,per_device_eval_batch_size=16,eval_strategy="epoch")# Create a supervised model trainer for fine-tuning the LLM modeltrainer = SFTTrainer(model=pretrained_model,processing_class=tokenizer,data_collator=data_collator,args=training_args,train_dataset=tokenized_training_data,eval_dataset=tokenized_test_data,compute_metrics=compute_metrics)
After creating the trainer, let’s fine-tune the LLM.
Step 8- Train and save the fine-tuned model
To fine-tune the LLM using the SFTTrainer
, we will invoke the train()
method on the trainer object returned by the SFTTrainer()
function.
trainer.train()
After training the model, we can save it to the local machine using the save_model()
method. The save_model()
method, when invoked on the trainer object, takes the file path for the model as its input and saves the model to the local storage.
file_path_to_save_the_model = '/home/aditya1117/codes/text-classifier-supervised-codecademy'trainer.save_model(file_path_to_save_the_model)
We can also push the fine-tuned LLM model to the HuggingFace models repository. For this, we need to invoke the push_to_hub()
method on the trainer and pass a model name for the fine-tuned model:
model_name = "text-classifier-supervised-codecademy"trainer.push_to_hub(model_name)
After successful execution, the push_to_hub()
function returns a commit message that contains the link to the fine-tuned model in the format https://huggingface.co/<user_name>/<model_name>
, as shown in the following output:
CommitInfo(commit_url='https://huggingface.co/raditya1117/text-classifier-supervised-codecademy/commit/5a04ca039661b209c0c4cedaf7f680f10e1945d1', commit_message='text-classifier-supervised-codecademy', commit_description='', oid='5a04ca039661b209c0c4cedaf7f680f10e1945d1', pr_url=None, repo_url=RepoUrl('https://huggingface.co/raditya1117/text-classifier-supervised-codecademy', endpoint='https://huggingface.co', repo_type='model', repo_id='raditya1117/text-classifier-supervised-codecademy'), pr_revision=None, pr_num=None)
In the output, you can see that our fine-tuned LLM is stored at the address https://huggingface.co/raditya1117/text-classifier-supervised-codecademy
. Here, raditya1117
is the username of the HuggingFace account, and text-classifier-supervised-codecademy
is the model name.
If you go to the above link, you will see all the details of the model in the model card. You can also update the details to provide more information about the training dataset and other metrics.
Step 9- Load the fine-tuned model
After saving the fine-tuned model, you can use the AutoModelForSequenceClassification.from_pretrained()
function to load and use it. To load the fine-tuned model from the HuggingFace repository, you need to pass the model name in the format ‘your_huggingface_username/model_name’, as shown below:
model_name = "raditya1117/text-classifier-supervised-codecademy"model = AutoModelForSequenceClassification.from_pretrained(model_name)
To load the fine-tuned model from the local storage, you can pass the model’s file path to the from_pretrained()
method.
model_filepath = "/home/aditya1117/codes/text-classifier-supervised-codecademy"model = AutoModelForSequenceClassification.from_pretrained(model_filepath)
After loading the model, you can use it for inference or fine-tune it further to improve its accuracy and performance.
Conclusion
Fine-tuning Large Language Models (LLMs) is a game-changer for customizing AI models to specific tasks like sentiment analysis, chatbots, and text summarization. Instead of training from scratch, fine-tuning adapts powerful pre-trained models for higher accuracy, better efficiency, and domain-specific performance while saving computational resources.
To learn more about using LLMs, you can go through this course on fine-tuning transformer models. You might also like this course on building your own LLM using PyTorch.
Happy learning!
Author
'The Codecademy Team, composed of experienced educators and tech experts, is dedicated to making tech skills accessible to all. We empower learners worldwide with expert-reviewed content that develops and enhances the technical skills needed to advance and succeed in their careers.'
Meet the full teamRelated articles
- Article
Getting Started With Hugging Face
An introduction to one of the most popular new ML education and model-building resources. - Article
LLM Data Security Best Practices
An exploration of Large Language Model (LLM) data security best practices - Article
Deep Learning Workflow
In this article, we cover the workflow for a deep learning project.
Learn more on Codecademy
- Course
Finetuning Transformer Models
Master the art of LLM finetuning with LoRA, QLoRA, and Hugging Face. Learn how to prepare, train and optimize models for specific tasks efficiently.With CertificateIntermediate3 hours - Free course
Intro to Large Language Models (LLMs)
Learn the basics of large language models (LLMs) and text-based Generative Artificial Intelligence (AI). We’ll show you how LLMs work and how they’re used.Beginner Friendly< 1 hour - Free course
Using OpenAI APIs: Fine-tuning Models, the Assistants API, & Embeddings
Explore fine-tuning AI models like GPT-3 and 4 with OpenAI APIs. Learn to utilize the Assistants API and understand the creation and comparison of text embeddings.Intermediate1 hour