Articles

RAG Chatbot With HuggingFace And Streamlit: Complete Tutorial

Customer support teams handle thousands of repetitive questions daily. AI chatbots can answer these questions instantly, 24/7. In this tutorial, we’ll build a Hugging Face RAG chatbot using Streamlit to create intelligent customer support that understands context and retrieves accurate information from your knowledge base.

This tutorial focuses on e-commerce because the rich datasets and use cases are common. However, these techniques work for any industry—healthcare, finance, education, or retail.

What is RAG? Instead of training a model from scratch, RAG systems store knowledge in a searchable database. Users ask questions, the system finds the most relevant information, and returns accurate answers.

You’ll work with three core Hugging Face libraries throughout this tutorial. Datasets for loading data. Sentence Transformers for creating embeddings. Transformers ecosystem for model management.

Now that you understand what we’re building, let’s get started.

  • Learn about the Hugging Face AI and machine learning platform, and how their tools can streamline ML and AI development.
    • Beginner Friendly.
      < 1 hour
  • Learn Streamlit to build and deploy interactive AI applications with Python in this hands-on course.
    • With Certificate
    • Intermediate.
      1 hour

Video: Building RAG Chatbot with Hugging Face

Before you start, you can watch the entire process in this video tutorial. The video clearly shows how to build a Hugging Face RAG chatbot with Streamlit from beginning to end.

If you prefer following the written guide, continue with the detailed steps below.

Step 1. Setting up the project

To set up the project, download the zip file and extract it. Then, open the extracted folder in a code editor like VS Code. The file structure will look like this:

chatbot-project/
├── create_knowledge_base.py
├── app.py

We have two files:

  • create_knowledge_base.py - Template for building your knowledge base
  • app.py - Basic Streamlit app structure

Let’s create a virtual environment to keep your project isolated. We’ll use venv to create the virtual environment:

python -m venv chatbot-env

Activate the virtual environment:

# On Windows:
chatbot-env\Scripts\activate
# On Mac/Linux:
source chatbot-env/bin/activate

Next, we’ll install the required dependencies. Execute the following command in the terminal of your VS Code to install all dependencies:

pip install streamlit pandas numpy sentence-transformers faiss-cpu datasets pickle-mixin

Most of our dependencies come from the Hugging Face ecosystem.

With your environment set up, we’re ready to start the real work.

Step 2. Load e-commerce datasets using Hugging Face

Instead of scraping data, we can access thousands of high-quality datasets from Hugging Face Hub.

Let’s start building our create_knowledge_base.py file. All import statements are already included in our file. Let’s now define the create_ecommerce_knowledge_base() function, where we will include all our code:

def create_ecommerce_knowledge_base():

Loading datasets with Hugging Face

Add the following code to your create_ecommerce_knowledge_base() function:

print("Loading Bitext e-commerce dataset...")
# Load the dataset
dataset = load_dataset("bitext/Bitext-customer-support-llm-chatbot-training-dataset")
print(f"Dataset loaded: {len(dataset['train'])} examples")

This code loads the Bitext e-commerce customer support dataset using the Hugging Face datasets library and prints the number of training examples.

Exploring the dataset structure

Hugging Face datasets come with standardized formats. Now we need to understand what we actually downloaded. Our dataset has four main pieces for each example:

  • instruction - what the customer asked
  • response - what the support agent answered
  • intent - what the customer really wanted to do
  • category - the type of problem this is

Data preprocessing and cleaning

Even with high-quality Hugging Face datasets, some preprocessing is needed. The data contains placeholder text that we need to clean:

# Prepare knowledge base
knowledge_base = []
for example in dataset['train']:
# Clean the response
response = example['response']
response = response.replace("{{Order Number}}", "your order")
response = response.replace("{{Online Company Portal Info}}", "our website")
response = response.replace("{{Online Order Interaction}}", "Order History")
response = response.replace("{{Customer Support Hours}}", "business hours")
response = response.replace("{{Customer Support Phone Number}}", "our support line")
response = response.replace("{{Website URL}}", "our website")
knowledge_base.append({
'question': example['instruction'],
'answer': response,
'intent': example['intent'],
'category': example['category']
})
print(f"Knowledge base created with {len(knowledge_base)} entries")

In this code, we clean up our data to make it more user-friendly. The dataset we downloaded contains template responses with placeholders like {{Order Number}}. These placeholders work great for training, but they’d confuse real users. Imagine if your chatbot said “Check {{Order Number}}” instead of “Check your order”. That would look broken. To fix that, we are:

  • Looping through each example in the dataset and using Python’s replace() method to clean the template responses by removing placeholders like {{Order Number}}, making the responses more natural for real users.
  • After cleaning, we organize each example into a dictionary with four key parts:
    • user’s message as a question
    • cleaned response as answer
    • intent
    • category to help with classification or future filtering
  • Each of these dictionaries is then added to a list using the append() method, which acts as our chatbot’s knowledge base—like building a well-organized library of Q&A pairs.
  • Finally, we print the total number of entries we’ve added to the knowledge base, which helps us verify that the data processing step is complete.

Perfect! Our data is now clean and organized. Next, we’ll build the intelligent part - the system that can understand questions and find relevant answers.

Step 3. Building the RAG system with Hugging Face Transformers

Now we need to convert our text into numbers. Computers can compare numbers. Hugging Face sentence transformers are specifically designed for this.

Loading Hugging Face Sentence Transformers

# Load sentence transformer for embeddings
print("Loading sentence transformer model...")
model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')

Here, we load the brain of our system. This is the model that understands language. Let us understand why this is so important and how it works.

  • We load our model using SentenceTransformer()
  • The model name sentence-transformers/all-MiniLM-L6-v2 might look complicated, but it’s just the specific address of this model on Hugging Face Hub
  • This model is great because it’s small, so it loads quickly. It’s fast, so it gives answers rapidly. But it’s still very good at understanding language. It’s been trained on millions of sentence pairs. It understands that “track my order” and “where is my package” mean the same thing, even though they use different words.
  • When we create SentenceTransformer(model_name), Hugging Face automatically downloads the model—but only if it’s not already on your computer. Then, it loads it into memory, ready to use.

Creating embeddings

Now let’s convert all our questions into vectors, which are lists of numbers that represent information in a form computers can process:

# Create embeddings for all questions
print("Creating embeddings...")
questions = [item['question'] for item in knowledge_base]
embeddings = model.encode(questions, show_progress_bar=True)

This is where the LLMs understand the text. We’re converting human language into numbers. Computers can work with these numbers mathematically.

Let us understand exactly what’s happening:

  • We extract all questions from the knowledge base using a list comprehension: [item['question'] for item in knowledge_base].
  • This creates a new list of only the questions, separating them from the full question-answer data.
  • We pass this list to the model.encode(questions, show_progress_bar=True) to convert each question into a vector of numbers.
  • These vectors represent the meaning of each question, and similar questions produce similar vectors.
  • The show_progress_bar=True parameter shows a progress bar, which helps when encoding many questions.

Set up vector search for intelligent responses

Now, we need to build a search engine for our vectors. Imagine you have thousands of books. You want to find books on similar topics quickly. You’d need a good indexing system. That’s exactly what we’re building here.

# Create FAISS index
print("Creating FAISS index...")
dimension = embeddings.shape[1]
index = faiss.IndexFlatIP(dimension) # Inner Product for similarity
# Normalize embeddings for cosine similarity
faiss.normalize_L2(embeddings)
index.add(embeddings.astype('float32'))

We are using FAISS, which is Meta’s library for searching through vectors super quickly, even with millions of them. In this code snippet:

  • We get the number of dimensions in our vector embeddings using embeddings.shape[1], which tells us how many features each question vector has (e.g., 384).
  • We create a FAISS index using faiss.IndexFlatIP(dimension), which uses inner product similarity to compare vectors of the specified dimension.
  • We normalize the vectors with faiss.normalize_L2(embeddings) to ensure all vectors have the same length, allowing for accurate similarity comparisons.
  • We add the normalized vectors to the index with index.add(embeddings.astype('float32')), converting them to float32 format for compatibility and performance with FAISS.

Saving the system

Let’s save everything so we don’t have to recreate it every time:

# Save everything
print("Saving knowledge base and index...")
# Save knowledge base
with open('knowledge_base.pkl', 'wb') as f:
pickle.dump(knowledge_base, f)
# Save FAISS index
faiss.write_index(index, 'ecommerce_index.faiss')
# Save model name for later loading
with open('model_name.txt', 'w') as f:
f.write('sentence-transformers/all-MiniLM-L6-v2')
print("Knowledge base created successfully!")
print("Files created:")
print("- knowledge_base.pkl")
print("- ecommerce_index.faiss")
print("- model_name.txt")

We need to save our work because creating embeddings takes time. We don’t want to do it every time we run our chatbot. We’ll use pickle to save our knowledge base:

  • with open('knowledge_base.pkl', 'wb') as f opens the file safely, and pickle.dump(knowledge_base, f) writes all the data to it.
  • The with block ensures the file is closed properly, even if an error occurs during the save process.
  • We save the FAISS index using faiss.write_index(index, 'ecommerce_index.faiss'), which stores it in FAISS’s optimized format.
  • Lastly, we save the model name to a text file to ensure we reload the exact same model later, keeping the embeddings consistent.

Step 4. Create a Streamlit interface for your AI chatbot

Now let’s build the Streamlit chat interface that can help Users interact with our Hugging Face-powered system.

Loading the system

The following function will load our entire saved system back into memory. Add this to your app.py:

@st.cache_resource
def load_system():
try:
required_files = ['knowledge_base.pkl', 'ecommerce_index.faiss', 'model_name.txt']
for file in required_files:
if not os.path.exists(file):
st.error(f"Missing: {file}")
return None, None, None
with open('model_name.txt', 'r') as f:
model_name = f.read().strip()
model = SentenceTransformer(model_name)
with open('knowledge_base.pkl', 'rb') as f:
knowledge_base = pickle.load(f)
index = faiss.read_index('ecommerce_index.faiss')
return model, knowledge_base, index
except Exception as e:
st.error(f"Error: {str(e)}")
return None, None, None
  • In this code, The @st.cache_resource decorator ensures that the model and data are loaded only once, boosting performance by preventing repeated loading on every user interaction.
  • We check for the existence of all required files using os.path.exists() in a loop; if any are missing, we display an error and return None to avoid crashing later.
  • The model name is read from a text file using f.read().strip(), giving us the exact name we need to reload the same Hugging Face model with SentenceTransformer(model_name).
  • We reload the saved knowledge base using pickle.load(f) and restore the FAISS search index with faiss.read_index('ecommerce_index.faiss'), recreating our search setup.

Implement RAG answer retrieval

This is where we’ll find answers using our Hugging Face system:

def get_answer(query, model, knowledge_base, index):
try: query_embedding = model.encode([query])
faiss.normalize_L2(query_embedding)
scores, indices = index.search(query_embedding.astype('float32'), 3)
best_idx = indices[0][0]
best_score = scores[0][0]
if best_score < 0.3:
return get_fallback(query)
best_match = knowledge_base[best_idx]
return {
'answer': best_match['answer'],
'confidence': "High" if best_score > 0.7 else "Medium"
}
except Exception as e:
return {'answer': f"Sorry, error occurred: {str(e)}", 'confidence': 'Low'}

The core of our RAG system is designed to find the best answer from our knowledge base when a user poses a question. Let us understand exactly how this works.

  • We convert the user’s question into a vector using model.encode([query]); the query is wrapped in a list because the model expects a list of texts.
  • The query vector is normalized with faiss.normalize_L2(query_embedding), matching how we normalized the training data.
  • We perform the search using index.search(query_embedding.astype('float32'), 3), which returns the top 3 most similar vectors with their scores and indices.
  • We extract the best match using [0][0] indexing and check if the score is below 0.3; if so, we call a fallback response instead of returning a low-confidence answer.
  • If the match is good, we retrieve the corresponding question-answer from the knowledge base, assign a confidence label (“High” or “Medium”), and return both in a dictionary for display.

This structure makes it easy for our interface to display the information correctly.

Fallback response system

We will create a backup response system for times when we cannot find a good match, so we can still provide help. Add the following function to the app.py file:

def get_fallback(query):
query_lower = query.lower()
responses = {
'track': "Track your order in 'My Account' > 'Order History'",
'return': "We offer 30-day returns. Start in your account.",
'refund': "Refunds process in 5-7 business days",
'cancel': "Cancel orders within 1 hour in your account",
'shipping': "Standard: 3-5 days, Express: 1-2 days",
'payment': "We accept cards, PayPal, Apple Pay, Google Pay"
}
for keyword, response in responses.items():
if keyword in query_lower:
return {'answer': response, 'confidence': 'Medium'}
return {
'answer': "I'm here to help! Ask about orders, shipping, returns, or payments.",
'confidence': 'Low'
}

Sometimes our Hugging Face embeddings can’t find a good match. Maybe the user asked about something that was not in our training data, or they used very unusual phrasing. Instead of saying “I don’t know,” the system will provide helpful fallback responses.

In this code:

  • We convert the user’s query to lowercase with query_lower = query.lower() to make keyword matching case-insensitive.
  • We define a dictionary of common keywords paired with helpful, predefined responses for frequent topics.
  • We loop through each keyword and check if it appears in the lowercase query using if keyword in query_lower:. If it does, we return the associated response immediately with “Medium” confidence.
  • If no keywords match, we return a generic, friendly reply to guide the user and ensure they always get a helpful answer.

Chat interface layout

Let’s create the main interface.:

def main():
st.title("💬 Customer Support")
model, knowledge_base, index = load_system()
if not all([model, knowledge_base, index]):
st.stop()
if "messages" not in st.session_state:
st.session_state.messages = []

Now we build the actual chat interface that users will see and interact with.

Here:

  • st.title() displays a large heading at the top of the app, indicating it’s a customer support interface.
  • model, knowledge_base, index = load_system() calls a function to load the model, data, and search index, unpacking them into variables.
  • if not all([model, knowledge_base, index]): checks if any of these failed to load. If so, st.stop() halts the app and shows relevant error messages.
  • We initialize chat history using st.session_state to store messages across interactions, preserving conversation continuity even when the page refreshes.

Displaying chat messages

Add the following code in the same main() method:

# Display chat
for message in st.session_state.messages:
with st.chat_message(message["role"]):
st.write(message["content"])

This simple loop displays our entire conversation history. Every time the user interacts with the app, this code runs and redraws all the messages.

  • We loop through each message stored in st.session_state, where every message is a dictionary containing a role (“user” or “assistant”) and the message content.
  • Using with st.chat_message(message["role"]):, we create a chat bubble styled differently depending on who sent it — user messages on the right, assistant messages on the left.
  • Inside that chat bubble, st.write(message["content"]) displays the message text, handling formatting like plain text, markdown, or simple HTML automatically.

Handling user input

# Chat input
if prompt := st.chat_input("Ask me anything..."):
st.session_state.messages.append({"role": "user", "content": prompt})
with st.chat_message("user"):
st.write(prompt)
with st.chat_message("assistant"):
response = get_answer(prompt, model, knowledge_base, index)
st.write(response['answer'])
st.session_state.messages.append({
"role": "assistant",
"content": response['answer']
})
if name == "main": main()

Let us break down this interaction flow step by step.

  • st.chat_input("Ask me anything...") creates a text box with placeholder text and uses the walrus operator := to assign the user’s input to the prompt only if they typed something.
  • When the user submits a message, we append it to st.session_state.messages as a dictionary with “role”: “user” and the typed content.
  • We immediately display the user’s message using st.chat_message.
  • We call get_answer(prompt, model, knowledge_base, index) to generate the AI’s reply by encoding the question, searching the knowledge base, and retrieving the best answer.
  • The AI’s response is shown in an assistant chat bubble and also added to the session history to keep the conversation intact.
  • The if name == "main": block ensures the main function runs only when this script is executed directly, not when imported.

Testing your Hugging Face chatbot

Let’s test our Hugging Face-powered system and see how well it performs.

Running the system

First, create the knowledge base using Hugging Face tools. Run the following command in the terminal:

python create_knowledge_base.py

Hugging Face will download the dataset and model, then create embeddings. This will take a few minutes the first time.

It will display a message in the terminal that says:

Saving knowledge base and index...
Knowledge base created successfully!
Files created:
- knowledge_base.pkl
- ecommerce_index.faiss
- model_name.txt

Then you can run the chat interface using the following command:

streamlit run app.py

This is how the Interface of the app will look: HuggingFace RAG chatbot Streamlit interface output

Great! Our chatbot is now complete and ready to chat. You can check out the whole code by downloading this zip file.

Let’s test it and see how well our Hugging Face-powered system performs.

Testing with customer queries

Try these questions. See how Hugging Face embeddings handle different phrasings:

Order Questions:

  • “How can I track my order?”
  • “Where is my package?”
  • “I want to cancel my order”

Try variations:

  • “Find my shipment” (should match tracking questions)
  • “Stop my purchase” (should match cancellation)
  • “Get my money back” (should match refund questions)

Here’s a look at how the chat interface will be:

AI Chatbot with Hugging Face chat interface output

The Hugging Face sentence transformer should recognize these as similar even though the words are different.

This reflects how the Hugging Face models were trained. They learned from texts across the internet.

You’ve successfully built and tested your chatbot! Now, let’s go over your accomplishments and explore what you should do next.

Conclusion

You’ve successfully built a Hugging Face Rag chatbot with a Streamlit interface. We:

  • Used Hugging Face Datasets to load training data with one line of code
  • Applied Hugging Face Sentence Transformers for semantic understanding
  • Built a RAG system that finds relevant answers automatically
  • Created a Streamlit chat interface that works in real-time
  • Handled edge cases with fallback responses

If you want to learn more about LLMs and RAG systems, check out these comprehensive courses:

These courses will take your AI skills to the next level and help you build even more sophisticated applications.

Frequently asked questions

1. What is a custom chatbot, and how is it different from regular FAQ pages?

A custom chatbot is an AI-powered conversational interface that provides interactive, real-time responses to user questions, while FAQ pages are static lists that users must browse through manually.

Key Differences:

  • Chatbots provide interactive experiences vs. reading static content
  • Instant answers without navigating website menus
  • Use natural language processing to understand various question formats
  • Real-time updates vs. potentially outdated static content

2. What is the correct way to format a conversational dataset in a .jsonl or text file for fine-tuning a model on Hugging Face?

Use JSONL format with a “messages” structure containing system, user, and assistant roles.

{
"messages": [
{"role": "system", "content": "You are a helpful assistant"},
{"role": "user", "content": "User's question"},
{"role": "assistant", "content": "Assistant's response"}
]
}

Each line should be a separate JSON object representing one conversation.

3. What’s the difference between rule-based chatbots and AI chatbots for customer service?

Rule-based chatbots follow predefined rules and flowcharts. They’re quick to build, easy to maintain, and perfect for handling structured queries and FAQs. AI chatbots use machine learning to understand context and intent. They handle complex conversations, learn from interactions, and improve over time. Choose rule-based for simple, predictable queries. Choose AI for dynamic, complex customer service needs.

4. What type of conversational AI chatbot is best for my business needs?

The best choice depends on your business size and needs. Small businesses should use rule-based chatbots for basic support, FAQs, and routine tasks with minimal cost. Large enterprises benefit from AI chatbots that handle complex scenarios and provide comprehensive support with better scalability. Consider your budget, technical capabilities, and team resources when choosing.

Codecademy Team

'The Codecademy Team, composed of experienced educators and tech experts, is dedicated to making tech skills accessible to all. We empower learners worldwide with expert-reviewed content that develops and enhances the technical skills needed to advance and succeed in their careers.'

Meet the full team

Learn more on Codecademy

  • Learn about the Hugging Face AI and machine learning platform, and how their tools can streamline ML and AI development.
    • Beginner Friendly.
      < 1 hour
  • Learn Streamlit to build and deploy interactive AI applications with Python in this hands-on course.
    • With Certificate
    • Intermediate.
      1 hour
  • Learn about what transformers are (the T of GPT) and how to work with them using Hugging Face libraries
    • Intermediate.
      3 hours