Articles

How to Build a Smart Q&A Bot with Haystack: RAG Made Easy

Learn how to build a smart Q&A bot with Haystack using Retrieval-Augmented Generation (RAG). Step-by-step guide to set up and query your own knowledge base.

In the age of generative AI, building intelligent question-answering systems is no longer a task exclusive to large research teams or tech giants. Thanks to open-source frameworks like Haystack, we can now build smart Q&A bots powered by Retrieval-Augmented Generation (RAG) with ease and flexibility.

In this guide, we’ll explore what Haystack is and its standout features, and walk step-by-step through the process of building a robust, production-ready Q&A bot. Whether we’re exploring AI out of curiosity, building smart tools as developers, or unlocking insights from internal documents for our business, this guide will walk us through it all.

What is Haystack?

Haystack is an open-source natural language processing (NLP) framework developed by Deepset that is designed to help developers create powerful search systems, Q&A bots, and intelligent agents. At its core, Haystack makes it incredibly easy to combine large language models (LLMs) with our own data sources using the RAG approach.

With Haystack, we can build end-to-end pipelines that retrieve relevant information and generate precise answers—whether from PDFs, websites, databases, or APIs. This means we can turn raw, unstructured content into actionable insights using the power of large language models and intelligent retrieval.

With its modular design and support for powerful retrieval and generation models, Haystack stands out as a go-to framework for building modern, intelligent search systems. But to truly appreciate its capabilities, we need to understand the core components that make it all work.

Related Course

Creating AI Applications using Retrieval-Augmented Generation (RAG)

Learn how to give your large language model the powers of retrieval with RAG, and build a RAG app with Streamlit and ChromaDB.Try it for free

Key components of Haystack

Before we dive into building our smart Q&A bot, it’s important to get a solid understanding of the fundamental components that make Haystack such a flexible and powerful framework.

At its core, Haystack follows a modular architecture. Each task—whether it’s reading documents, retrieving relevant content, or generating an answer—is handled by a distinct component. This modularity means we can build custom pipelines tailored to our use case by simply mixing and matching these building blocks.

Let’s explore each of these components in more detail to understand how they work together in an RAG system.

DocumentStore

The DocumentStore is where all our content and metadata live. It can be an in-memory store for quick testing or a scalable backend like Elasticsearch, FAISS, or Weaviate. Every document added here becomes searchable and retrievable by other components.

Retriever

The Retriever filters the most relevant documents based on a user’s query, significantly reducing the number of documents passed to the generator or reader. It supports traditional keyword-based methods like BM25, vector search, and semantic search. Let’s have a brief introduction to these methods.

BM25

BM25 (Best Matching 25) is a traditional keyword-based ranking algorithm used to estimate the relevance of documents to a given search query. It scores documents based on:

  • The term frequency (TF): The frequency in which the term appears in a document.
  • The inverse document frequency (IDF): The rarity of the term across all documents.
  • Document length normalization

Vector search represents text (queries & documents) as vectors in a high-dimensional space using embeddings (typically from models like BERT, OpenAI, Cohere, etc.).

Then, it retrieves documents by measuring vector similarity (e.g., cosine similarity) between the query and document vectors.

Semantic search aims to understand the meaning or intent behind the query, not just the literal words. It often uses:

  • Transformer models (e.g., BERT, OpenAI Embeddings)
  • Vector representations
  • Sometimes also combines keyword and vector scores (hybrid search)

Reader / Generator

The Reader extracts exact answers from the retrieved documents, ideal for extractive Q&A. The Generator, often a large language model, synthesizes complete, context-aware answers. In RAG systems, we typically rely on the Generator for natural, conversational outputs.

Pipelines

A Pipeline defines how components are connected and how data flows through them. Whether it’s a simple two-step retrieval and generation setup or a complex multi-branch workflow, the pipeline makes everything modular, flexible, and production-ready.

Nodes

Nodes are the functional units within a pipeline. Each Node performs a specific task—retrieving, generating, converting, or classifying. By combining different Nodes, we can build tailored workflows for tasks like Q&A, summarization, or document indexing.

Now that we’ve explored the essential building blocks of Haystack, it’s time to put them into action. In the next section, we’ll bring these components together to build a smart Q&A bot using Haystack.

Building a smart Q&A bot with Haystack

This section will walk us through creating a simple RAG pipeline using Haystack. We’ll process a small set of documents, index them, and query them with a generative LLM that references retrieved content for factual grounding. A prompt-builder template will glue it all together.

Let’s get started!

Prerequisites

The prerequisites for this include:

Once these prerequisites are satisfied, we move on to the actual process.

Step 1: Create a virtual environment

Open the terminal and run these commands to create a virtual environment and activate it:

python3 -m venv haystack_chatbot
source haystack_chatbot/bin/activate

Then, install the required packages:

pip install haystack-ai
pip install "datasets>=2.6.1"
pip install "sentence-transformers>=4.1.0"
pip install streamlit

Next, create a Python script named chatbot.py:

touch chatbot.py

After creating the script, open it in a code editor and import the necessary modules:

import os
import streamlit as st
from getpass import getpass
from datasets import load_dataset
from haystack import Document
from haystack import Pipeline
from haystack.components.generators.chat import OpenAIChatGenerator
from haystack.components.embedders import SentenceTransformersDocumentEmbedder
from haystack.components.embedders import SentenceTransformersTextEmbedder
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever
from haystack.components.builders import ChatPromptBuilder
from haystack.dataclasses import ChatMessage

Step 2. Load and index documents

Next, we need to initialize an in-memory document store. This acts as a simple, fast database to store and retrieve documents without needing any external infrastructure:

document_store = InMemoryDocumentStore()

Then, load a dataset and convert each entry into the Document object containing both content and metadata. In this case, we’ll be using the bilgeyucel/seven-wonders dataset, which includes information about the seven wonders of the ancient world:

dataset = load_dataset("bilgeyucel/seven-wonders", split="train")
docs = [Document(content=doc["content"], meta=doc["meta"]) for doc in dataset]

Initialize a document embedder using a specific model and warm it up for efficient embedding computation. In this case, we’ll be using the sentence-transformers/all-MiniLM-L6-v2 model, which will convert each document into a dense vector:

doc_embedder = SentenceTransformersDocumentEmbedder(model="sentence-transformers/all-MiniLM-L6-v2")
doc_embedder.warm_up()

Use the embedder to generate vector embeddings for the list of documents. Then, write these enriched documents (now containing embeddings) into the document store for future retrieval:

docs_with_embeddings = doc_embedder.run(docs)
document_store.write_documents(docs_with_embeddings["documents"])

Step 3: Set up embedder and retriever

Initialize a text embedder, which will transform the user’s question into a vector using the same model as the document embedder to ensure compatibility:

text_embedder = SentenceTransformersTextEmbedder(model="sentence-transformers/all-MiniLM-L6-v2")

Set up the retriever, which will compare the query vector with stored document vectors to find the most relevant content. It uses the in-memory document store and the embeddings that we’ve just added:

retriever = InMemoryEmbeddingRetriever(document_store)

Step 4: Create a prompt template

Design a template that structures how retrieved information and the user’s question will be presented to the language model:

template = [
ChatMessage.from_user(
"""
You're a helpful assistant who looks up answers for a user in a dataset and returns the answer to the user's question.
Context:
{% for document in documents %}
{{ document.content }}
{% endfor %}
Question: {{question}}
Answer:
"""
)
]
prompt_builder = ChatPromptBuilder(template=template)

This format enables us to dynamically insert multiple document contents and the user query into the prompt. It also ensures that the language model has the most relevant context before generating an answer, resulting in more accurate and grounded responses.

Step 5: Add the LLM generator

Set up access to OpenAI’s API for the language model. If the API key isn’t already in the environment, prompt the user to enter it. Then, initialize the OpenAI chat generator using a particular model for generating responses. In this case, we’re using the gpt-4o-mini model:

if "OPENAI_API_KEY" not in os.environ: os.environ["OPENAI_API_KEY"] = getpass("Enter OpenAI API key: ")
chat_generator = OpenAIChatGenerator(model="gpt-4o-mini")

This component is responsible for generating a final answer using the constructed prompt and the underlying language model.

Step 6: Build the RAG pipeline

Here, we put all the components together into a Haystack pipeline. Each component is added in a logical order: embedding the query, retrieving documents, building a prompt, and generating the answer.

basic_rag_pipeline = Pipeline()
basic_rag_pipeline.add_component("text_embedder", text_embedder)
basic_rag_pipeline.add_component("retriever", retriever)
basic_rag_pipeline.add_component("prompt_builder", prompt_builder)
basic_rag_pipeline.add_component("llm", chat_generator)

Next, connect the components so that data flows seamlessly through the pipeline. The text embedder outputs a query vector, which is passed to the retriever. The retrieved documents are used to build a prompt, and the prompt is sent to the LLM for response generation:

basic_rag_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")
basic_rag_pipeline.connect("retriever", "prompt_builder")
basic_rag_pipeline.connect("prompt_builder.prompt", "llm.messages")

Step 7: Create the user interface

Use Streamlit to build a lightweight, interactive web interface. This lets us enter questions in a text field and view generated answers directly in the browser:

st.title("Smart Q&A Chatbot") question = st.text_input("Ask a question:")
submit_button = st.button("Submit")

When the submit button is clicked, the pipeline is executed with the user’s question as input. The response is then displayed on the screen:

if submit_button:
response = basic_rag_pipeline.run({"text_embedder": {"text": question}, "prompt_builder": {"question": question}})
st.write("Answer:")
st.write(response["llm"]["replies"][0].text)

Step 8: Query the chatbot

With everything in place, it’s now time to run the app.

So, let’s open the terminal again and run this command:

streamlit run chatbot.py

Upon running the command, the app will launch in the default browser. Here, we can enter questions regarding the seven wonders of the ancient world, and the chatbot will provide factually grounded answers based on the retrieved context.

Here is an example:

Haystack chatbot's response to the user query

Congratulations! You have successfully built a smart Q&A bot using Haystack that understands context and retrieves factual answers using your own documents.

Real-world use cases of Haystack

Haystack is already powering production-grade AI systems across various industries. Here are a few real-world use cases:

  • Healthcare: Knowledge assistants for medical documentation and drug information retrieval.
  • Legal: Document summarization and intelligent search over case law.
  • Enterprise: Internal chatbots for HR, IT, and knowledge base automation.
  • Publishing: Personalized content curation and FAQ automation.
  • Research: Semantic search across research papers and academic literature.

These use cases demonstrate how versatile and powerful Haystack is across different industries and applications.

Conclusion

In this tutorial, we explored what Haystack is and why it stands out as a leading RAG framework. We broke down its modular components to understand how they work together. Then, we built a simple yet smart Q&A bot using our own data, demonstrating Haystack’s practical power. Finally, we highlighted real-world applications of Haystack across various industries, showcasing its versatility and impact.

Haystack is democratizing the way we build and deploy RAG-based applications. With its intuitive design, flexible architecture, and active community, we can confidently develop smart bots that deliver real value—without reinventing the wheel.

If you want to learn more about creating AI applications using RAG, check out the Creating AI Applications using Retrieval-Augmented Generation (RAG) course on Codecademy.

Frequently asked questions

1. Is Haystack free to use?

Yes! Haystack is completely open-source under the Apache 2.0 license. You can use, modify, and even contribute to it without any licensing fees.

2. Is Haystack better than LangChain?

It depends on your use case. Haystack shines when building search-centric, RAG-based systems with production pipelines and multiple backend integrations. LangChain, on the other hand, is more focused on agentic workflows and chaining prompts. Both are excellent, and many developers use them together.

3. What is the difference between Haystack and LlamaIndex?

Haystack is a pipeline-based framework ideal for search and retrieval tasks, while LlamaIndex focuses on data indexing and querying with LLMs. LlamaIndex excels in structured data abstraction, whereas Haystack provides end-to-end Q&A and search capabilities with flexible pipeline control.

Codecademy Team

'The Codecademy Team, composed of experienced educators and tech experts, is dedicated to making tech skills accessible to all. We empower learners worldwide with expert-reviewed content that develops and enhances the technical skills needed to advance and succeed in their careers.'

Meet the full team