How to Use Hugging Face: Beginner's Guide to AI Models
What is Hugging Face?
Hugging Face is a leading open-source platform for building and deploying machine learning (ML) models, especially in natural language processing (NLP). It provides powerful tools like the Transformers library, a Model Hub with thousands of pre-trained models (e.g., GPT-2, BERT), and access to over 100,000 datasets for tasks in NLP, computer vision, and audio.
We can quickly fine-tune models on custom data, tokenize text automatically, and even evaluate performance, all with minimal setup. The Hugging Face Hub lets us store, share, and reuse models, making collaboration and deployment seamless.

Now that we understand what Hugging Face offers, let’s walk through the steps to set up your environment.
How to set up Hugging Face for model training
Hugging Face is free to use, and creating an account only requires an email address. In many ways, the platform is analogous to GitHub in its function as well as its approach - all the main features are free and open to the public without limits. Anyone can create and upload as many models as they want at no additional cost.
The workflow shown in this tutorial saves the trained model to the Hub repo. The only additional (account) configuration necessary is the creation of a key that will provide access to a user profile from the notebook environment.
Keys can be managed under the profile/settings page.
Note: The key must be “write” enabled, otherwise an error will be thrown.

Setting up Hugging Face in Google Colab
We’ll complete this project entirely within a Google Colab notebook, an online coding environment similar to Jupyter. Colab makes it easy to run Python code in the browser with access to cloud-based resources like GPUs. To get started, open a new Colab notebook, install the required libraries, and begin building your Hugging Face project without needing any local setup.
We’ll start by importing all the required libraries. Colab notebooks have many popular libraries preloaded, but often you may need to begin by installing packages via pip. The first cell will make the following call:

In this cell, we call four different Hugging Face libraries:
transformers: Methods for preparing models and data, as well as accessing APIs.datasets: Provides tools for creating and accessing datasets.evaluate: Provides a range of metrics for monitoring and assessing the training process.accelerate: Supports efficient processing in the model training phase.
With these libraries installed, we can move on to loading the data, an archive of over 8k training examples from the Rotten Tomatoes film and TV review site.

The rt variable in this example is a dictionary that contains the predefined splits as seen in the following cell:

We can then review a given instance by calling a key and index:

Based on this excerpt, we can see that the dataset is composed of objects that have the text of the review, and a label (0 for negative and 1 for positive).
The only additional prerequisite we’ll address here is connecting to Hugging Face with the access token setup for “writing” to the account. The notebook cell will be set as follows:

The notebook_login() method call will return the token submission field seen above. Once completed, the notebook will be connected so that the model repo can be created and populated upon initiation of model training. In general, connecting in this form is not required for accessing public datasets or models from Hugging Face. But to commit to the Hub repo or access private models within your profile, a login is required.
How to tokenize text for model training
Working with text means employing tokenizers. Think of tokenization as breaking down a sentence into individual puzzle pieces—each word or part of a word becomes a ‘token’ that we can work with separately.
Tokenizers process text by segmenting sequences into these tokens: atomic parts that are converted to numerical IDs, and eventually a representative tensor. This process is essential because machine learning models can only work with numbers, not raw text.
Models can use any number of different tokenization systems- some are character-based, and others separate text into larger elements such as whole words. Fine-tuning a model requires the use of the same tokenization system applied in the original training. This is where the transformers library can provide high-level, concise abstractions that prevent us from having to retrieve and specify details regarding the selected model.
All the models in the repository include implementation information in a “model card”, which includes details regarding how to load the model, the data that was used in training, potential biases, etc.

The card shown here is for the deberta-v3-base model, the model of choice for this example. But many others can be substituted by exchanging the model call. Models with similar features can be searched in the Models repo through the selection of tags seen across the top of the card such as “English” for the language of the model, the applicable framework, etc.
The transformers library provides us with the AutoTokenizer method that automatically selects the appropriate tokenizer given our model, as seen in the following cell:

Note: To fine-tune the
“deberta-v3-base"model one additional installation is required:pip install sentencepiece. This is the tokenizer that was used in the original training. Other models will not require an additional installation, if there is a library required that is not present an error message will be raised.
Next, we define the tokenization function and apply it to the dataset to create a new “tokenized” set. The function we use for applying the tokenization is a definition that supplies the access format given how the dataset is constructed. It can also be configured to address several variables, and in this case, we specify that the elements should be truncated. Truncation ensures that input sequences do not exceed the maximum length of the model.

The function is then applied with the map() method from the datasets library.

If we call up the same sample on the processed dataset, we can see the changes made as a result of the tokenization. Now, each instance contains three additional fields that hold the token IDs, as well as values for the token type and attention mask attributes respectively.

Defining the model and evaluation metrics
For this basic training effort, we will employ a generic metric. Some evaluation metrics are tailored to specific datasets, others to specific tasks (e.g. name entity recognition), while others are more general. In this case, we’re using accuracy, which is the fraction of correct results relative to the total examples evaluated.

Next, we must define a method for returning our metrics. The definition has just three lines:
- The
eval_predparameter is destructured as thepredictionsandlabelsvariables. - Then
predictionsvariable is reassigned to the max value on the given axis. - The last line returns the
accuracy.compute()call with the relevant values for each parameter.

In addition to the metrics, the labels must be specified, as in the following cell:

The last import we have brings in our model and a pair of methods for our training run.

All the inputs are now present. The last definition will configure the training. It will consist of a basic set of parameters (e.g., batch size, etc.) as well as the data and metrics we’ve specified.
How to train a transformer model with Hugging Face
The model training function will use the Hugging Face Trainer() method. We instantiate a new instance and pass a minimal set of parameters before making the call to execute trainer.train(). Most of the values passed here represent common defaults. The parameters for the training arguments include:
output_dir: The name of the directory (Hub repo).learning_rate: Represents the initial learning rate used by the optimization function.per_device_train_batch_size(andper_device_eval_batch_size): The batch size for each CPU/Core/etc.num_train_epochs: The overall training period.weight_decay: The weight decay or regularization applied to layers.evaluation_strategy(andsave_strategy): How frequently the model will be evaluated and saved (these arguments must match).load_best_model_at_end: A boolean that can be set to ensure that the best-performing variation of the model is uploaded instead of the last iteration default.push_to_hub: A boolean that determines if the model is pushed to the Hub repository on every save.
The Trainer object that is called to perform the training is just the aggregation of all the elements configured to this point. The parameters include the downloaded model, the training arguments, the prepped training and testing datasets, and the evaluation metrics.

It is important to note that the training may take considerable time to process. All the cells before the training will require minimal processing time; however, the training itself can easily take an hour or more. There are alternative runtimes that can be selected within Colab, such as a T4 GPU or TPU. But availability is not guaranteed.
Once the training is underway, there will be an output that continuously updates its status. It will show the progress of the training as shown below:

Note: Colab notebooks will disconnect if there is inactivity for 90 minutes. Therefore, it’s important to check in on the status of the training periodically or set up a script to keep the notebook active.
Once the training is initiated, the new model repo will be available to view. Under the profile view, the new repo will appear under the Model heading with the given directory name. By selecting the model, we can bring up the repo as seen below:

Run predictions using your trained model
Once the model training is completed, we have a new model that can be used to label or predict values for new data. We can create an original text akin to the reviews data we’ve used for the training to solicit a prediction, also known as running inference.

We can use the pipeline method to easily submit (process) the example and return a prediction.

At the bottom of the output, we can see the label returned and the associated prediction score.
Now, moving forward, this model can be accessed at any time to run inference by calling the model directly (as in the cell above) without any additional prep work.
Conclusion
Hugging Face offers an accessible way to explore, fine-tune, and deploy powerful machine learning models. In this tutorial, we explored how to use tools like the Transformers library and Model Hub to train a text classification model entirely within a Google Colab notebook.
To continue building your skills in AI and NLP, check out Codecademy’s Build Chatbots with Python course. It’s a hands-on way to practice concepts like text processing, model training, and deploying intelligent applications.
Frequently asked questions
1. What is Hugging Face used for?
Hugging Face is used for building, training, and deploying machine learning models. Specifically, it specializes in natural language processing (NLP) tasks like text classification, sentiment analysis, translation, and chatbot development. The platform provides pre-trained models, datasets, and tools that make AI development accessible to developers of all skill levels.
2. Is Hugging Face AI free?
Yes, Hugging Face offers a free tier that gives users access to pre-trained models, datasets, and libraries like Transformers and Datasets.
3. Is Hugging Face better than OpenAI?
Hugging Face focuses on open-source models and community collaboration, while OpenAI provides commercial AI tools like ChatGPT. Hugging Face is ideal for those who want flexibility, transparency, and the ability to fine-tune models, whereas OpenAI is better suited for plug-and-play solutions with strong performance out of the box.
4. Is Hugging Face API free?
The Hugging Face Inference API has a free tier with limited usage, allowing developers to run models without hosting them. For higher throughput, private models, or commercial use, paid plans are available with expanded API access and support.
5. What is ZeroGPU Hugging Face?
ZeroGPU is a feature on Hugging Face that allows users to run certain models in the browser using WebAssembly and WebGPU, no GPU or server required. It enables lightweight, serverless inference for select models directly on the client side.
'The Codecademy Team, composed of experienced educators and tech experts, is dedicated to making tech skills accessible to all. We empower learners worldwide with expert-reviewed content that develops and enhances the technical skills needed to advance and succeed in their careers.'
Meet the full teamRelated articles
- Article
RAG Chatbot With HuggingFace And Streamlit: Complete Tutorial
Learn how to build an AI customer service chatbot with Hugging Face RAG. Complete tutorial with datasets, embeddings & Streamlit interface. - Article
Build a Sentiment Analysis App with Hugging Face and Streamlit
Learn how to build a sentiment analysis app with Hugging Face Transformers and Streamlit. - Article
How to Fine Tune Large Language Models (LLMs)
Learn how to fine tune large language models (LLMs) in Python with step-by-step examples, techniques, and best practices.
Learn more on Codecademy
- Learn about the Hugging Face AI and machine learning platform, and how their tools can streamline ML and AI development.
- Beginner Friendly.< 1 hour
- Learn about what transformers are (the T of GPT) and how to work with them using Hugging Face libraries
- Intermediate.3 hours
- Master the art of LLM finetuning with LoRA, QLoRA, and Hugging Face. Learn how to prepare, train and optimize models for specific tasks efficiently.
- With Certificate
- Intermediate.3 hours
- What is Hugging Face?
- How to set up Hugging Face for model training
- Setting up Hugging Face in Google Colab
- How to tokenize text for model training
- Defining the model and evaluation metrics
- How to train a transformer model with Hugging Face
- Run predictions using your trained model
- Conclusion
- Frequently asked questions