Codecademy Logo

Finetuning with Hugging Face

Padding and Truncating Training Examples

When finetuning, examples need to be passed to the model in uniform length so they can be processed in parallel. Hugging Face padding and truncation values. Padding is often determined by the longest sequence in the batch, while truncation automatically cuts a sequence off at the model’s maximum input length.

tokenized_text = tokenizer(text, padding="longest", truncation=True)

Tokenizing Finetuning Data

To tokenize a Hugging Face dataset, use the dataset instance’s .map method, passing in a function that receives a string and outputs a sequence of tokens. A second, named parameter of batched=True will ensure the data is tokenized in batches.

def tokenize_function(example_text):
return tokenizer(example_text, padding="longest", truncation=True)
tokenized_dataset= dataset.map(tokenize_function, batched=True)

CUDA and GPUs

Moving models and data to the GPU in PyTorch requires calling the .to() method and passing in torch.device("cuda").

# device-agnostic code:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

Hugging Face’s Trainer API

When your hyperparameters, model, and data are configured, you can finetune with the Hugging Face Trainer API by calling trainer.train(). Afterward, you can call trainer.evaluate() to gauge its performance against test data.

LoRA in Hugging Face

Finetuning runs can be performed with low-rank adaptation (LoRA) via Hugging Face’s peft library. Pass hyperparameters to LoraConfig(), then pass that config to the get_peft_model() along with the base model. A good starting point for the alpha hyperparameter is double the value of rank.

Quantization with Bitsandbytes

One popular choice of library for quantizing large language models is bitsandbytes, which can be used to quantize models to a variety of bit sizes, shrinking them for use on consumer hardware.

Perplexity in Language Model Evaluation

Perplexity (PPL) is a popular evaluation metric for generative language models, defined as the exponentiated cross-entropy of a sequence’s probability. It’s a good way to gauge how effective a model is at predicting some target text.

TrainingArguments in the Trainer API

Finetuning hyperparameters are configured via the TrainingArguments function in the transformers library, where epochs, learning rate, and other common training hyperparameters can be set.

training_args = TrainingArguments(
output_dir="./temp_results",
num_train_epochs=3,
per_device_train_batch_size=12,
per_device_eval_batch_size=12,
warmup_steps=500,
weight_decay=0.01,
learning_rate=1e-4,
)

Learn more on Codecademy