How to Build a Smart Q&A Bot with Haystack: RAG Made Easy
In the age of generative AI, building intelligent question-answering systems is no longer a task exclusive to large research teams or tech giants. Thanks to open-source frameworks like Haystack, we can now build smart Q&A bots powered by Retrieval-Augmented Generation (RAG) with ease and flexibility.
In this guide, we’ll explore what Haystack is and its standout features, and walk step-by-step through the process of building a robust, production-ready Q&A bot. Whether we’re exploring AI out of curiosity, building smart tools as developers, or unlocking insights from internal documents for our business, this guide will walk us through it all.
What is Haystack?
Haystack is an open-source natural language processing (NLP) framework developed by Deepset that is designed to help developers create powerful search systems, Q&A bots, and intelligent agents. At its core, Haystack makes it incredibly easy to combine large language models (LLMs) with our own data sources using the RAG approach.
With Haystack, we can build end-to-end pipelines that retrieve relevant information and generate precise answers—whether from PDFs, websites, databases, or APIs. This means we can turn raw, unstructured content into actionable insights using the power of large language models and intelligent retrieval.
With its modular design and support for powerful retrieval and generation models, Haystack stands out as a go-to framework for building modern, intelligent search systems. But to truly appreciate its capabilities, we need to understand the core components that make it all work.
Creating AI Applications using Retrieval-Augmented Generation (RAG)
Learn how to give your large language model the powers of retrieval with RAG, and build a RAG app with Streamlit and ChromaDB.Try it for freeKey components of Haystack
Before we dive into building our smart Q&A bot, it’s important to get a solid understanding of the fundamental components that make Haystack such a flexible and powerful framework.
At its core, Haystack follows a modular architecture. Each task—whether it’s reading documents, retrieving relevant content, or generating an answer—is handled by a distinct component. This modularity means we can build custom pipelines tailored to our use case by simply mixing and matching these building blocks.
Let’s explore each of these components in more detail to understand how they work together in an RAG system.
DocumentStore
The DocumentStore is where all our content and metadata live. It can be an in-memory store for quick testing or a scalable backend like Elasticsearch, FAISS, or Weaviate. Every document added here becomes searchable and retrievable by other components.
Retriever
The Retriever filters the most relevant documents based on a user’s query, significantly reducing the number of documents passed to the generator or reader. It supports traditional keyword-based methods like BM25, vector search, and semantic search. Let’s have a brief introduction to these methods.
BM25
BM25 (Best Matching 25) is a traditional keyword-based ranking algorithm used to estimate the relevance of documents to a given search query. It scores documents based on:
- The term frequency (TF): The frequency in which the term appears in a document.
- The inverse document frequency (IDF): The rarity of the term across all documents.
- Document length normalization
Vector search
Vector search represents text (queries & documents) as vectors in a high-dimensional space using embeddings (typically from models like BERT, OpenAI, Cohere, etc.).
Then, it retrieves documents by measuring vector similarity (e.g., cosine similarity) between the query and document vectors.
Semantic search
Semantic search aims to understand the meaning or intent behind the query, not just the literal words. It often uses:
- Transformer models (e.g., BERT, OpenAI Embeddings)
- Vector representations
- Sometimes also combines keyword and vector scores (hybrid search)
Reader / Generator
The Reader extracts exact answers from the retrieved documents, ideal for extractive Q&A. The Generator, often a large language model, synthesizes complete, context-aware answers. In RAG systems, we typically rely on the Generator for natural, conversational outputs.
Pipelines
A Pipeline defines how components are connected and how data flows through them. Whether it’s a simple two-step retrieval and generation setup or a complex multi-branch workflow, the pipeline makes everything modular, flexible, and production-ready.
Nodes
Nodes are the functional units within a pipeline. Each Node performs a specific task—retrieving, generating, converting, or classifying. By combining different Nodes, we can build tailored workflows for tasks like Q&A, summarization, or document indexing.
Now that we’ve explored the essential building blocks of Haystack, it’s time to put them into action. In the next section, we’ll bring these components together to build a smart Q&A bot using Haystack.
Building a smart Q&A bot with Haystack
This section will walk us through creating a simple RAG pipeline using Haystack. We’ll process a small set of documents, index them, and query them with a generative LLM that references retrieved content for factual grounding. A prompt-builder template will glue it all together.
Let’s get started!
Prerequisites
The prerequisites for this include:
- Python 3.x (Download from the official website)
- OpenAI API key (Generate from the official website)
Once these prerequisites are satisfied, we move on to the actual process.
Step 1: Create a virtual environment
Open the terminal and run these commands to create a virtual environment and activate it:
python3 -m venv haystack_chatbotsource haystack_chatbot/bin/activate
Then, install the required packages:
pip install haystack-aipip install "datasets>=2.6.1"pip install "sentence-transformers>=4.1.0"pip install streamlit
Next, create a Python script named chatbot.py
:
touch chatbot.py
After creating the script, open it in a code editor and import the necessary modules:
import osimport streamlit as stfrom getpass import getpassfrom datasets import load_datasetfrom haystack import Documentfrom haystack import Pipelinefrom haystack.components.generators.chat import OpenAIChatGeneratorfrom haystack.components.embedders import SentenceTransformersDocumentEmbedderfrom haystack.components.embedders import SentenceTransformersTextEmbedderfrom haystack.document_stores.in_memory import InMemoryDocumentStorefrom haystack.components.retrievers.in_memory import InMemoryEmbeddingRetrieverfrom haystack.components.builders import ChatPromptBuilderfrom haystack.dataclasses import ChatMessage
Step 2. Load and index documents
Next, we need to initialize an in-memory document store. This acts as a simple, fast database to store and retrieve documents without needing any external infrastructure:
document_store = InMemoryDocumentStore()
Then, load a dataset and convert each entry into the Document
object containing both content and metadata. In this case, we’ll be using the bilgeyucel/seven-wonders
dataset, which includes information about the seven wonders of the ancient world:
dataset = load_dataset("bilgeyucel/seven-wonders", split="train")docs = [Document(content=doc["content"], meta=doc["meta"]) for doc in dataset]
Initialize a document embedder using a specific model and warm it up for efficient embedding computation. In this case, we’ll be using the sentence-transformers/all-MiniLM-L6-v2
model, which will convert each document into a dense vector:
doc_embedder = SentenceTransformersDocumentEmbedder(model="sentence-transformers/all-MiniLM-L6-v2")doc_embedder.warm_up()
Use the embedder to generate vector embeddings for the list of documents. Then, write these enriched documents (now containing embeddings) into the document store for future retrieval:
docs_with_embeddings = doc_embedder.run(docs)document_store.write_documents(docs_with_embeddings["documents"])
Step 3: Set up embedder and retriever
Initialize a text embedder, which will transform the user’s question into a vector using the same model as the document embedder to ensure compatibility:
text_embedder = SentenceTransformersTextEmbedder(model="sentence-transformers/all-MiniLM-L6-v2")
Set up the retriever, which will compare the query vector with stored document vectors to find the most relevant content. It uses the in-memory document store and the embeddings that we’ve just added:
retriever = InMemoryEmbeddingRetriever(document_store)
Step 4: Create a prompt template
Design a template that structures how retrieved information and the user’s question will be presented to the language model:
template = [ChatMessage.from_user("""You're a helpful assistant who looks up answers for a user in a dataset and returns the answer to the user's question.Context:{% for document in documents %}{{ document.content }}{% endfor %}Question: {{question}}Answer:""")]prompt_builder = ChatPromptBuilder(template=template)
This format enables us to dynamically insert multiple document contents and the user query into the prompt. It also ensures that the language model has the most relevant context before generating an answer, resulting in more accurate and grounded responses.
Step 5: Add the LLM generator
Set up access to OpenAI’s API for the language model. If the API key isn’t already in the environment, prompt the user to enter it. Then, initialize the OpenAI chat generator using a particular model for generating responses. In this case, we’re using the gpt-4o-mini
model:
if "OPENAI_API_KEY" not in os.environ: os.environ["OPENAI_API_KEY"] = getpass("Enter OpenAI API key: ")chat_generator = OpenAIChatGenerator(model="gpt-4o-mini")
This component is responsible for generating a final answer using the constructed prompt and the underlying language model.
Step 6: Build the RAG pipeline
Here, we put all the components together into a Haystack pipeline. Each component is added in a logical order: embedding the query, retrieving documents, building a prompt, and generating the answer.
basic_rag_pipeline = Pipeline()basic_rag_pipeline.add_component("text_embedder", text_embedder)basic_rag_pipeline.add_component("retriever", retriever)basic_rag_pipeline.add_component("prompt_builder", prompt_builder)basic_rag_pipeline.add_component("llm", chat_generator)
Next, connect the components so that data flows seamlessly through the pipeline. The text embedder outputs a query vector, which is passed to the retriever. The retrieved documents are used to build a prompt, and the prompt is sent to the LLM for response generation:
basic_rag_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")basic_rag_pipeline.connect("retriever", "prompt_builder")basic_rag_pipeline.connect("prompt_builder.prompt", "llm.messages")
Step 7: Create the user interface
Use Streamlit to build a lightweight, interactive web interface. This lets us enter questions in a text field and view generated answers directly in the browser:
st.title("Smart Q&A Chatbot") question = st.text_input("Ask a question:")submit_button = st.button("Submit")
When the submit button is clicked, the pipeline is executed with the user’s question as input. The response is then displayed on the screen:
if submit_button:response = basic_rag_pipeline.run({"text_embedder": {"text": question}, "prompt_builder": {"question": question}})st.write("Answer:")st.write(response["llm"]["replies"][0].text)
Step 8: Query the chatbot
With everything in place, it’s now time to run the app.
So, let’s open the terminal again and run this command:
streamlit run chatbot.py
Upon running the command, the app will launch in the default browser. Here, we can enter questions regarding the seven wonders of the ancient world, and the chatbot will provide factually grounded answers based on the retrieved context.
Here is an example:
Congratulations! You have successfully built a smart Q&A bot using Haystack that understands context and retrieves factual answers using your own documents.
Real-world use cases of Haystack
Haystack is already powering production-grade AI systems across various industries. Here are a few real-world use cases:
- Healthcare: Knowledge assistants for medical documentation and drug information retrieval.
- Legal: Document summarization and intelligent search over case law.
- Enterprise: Internal chatbots for HR, IT, and knowledge base automation.
- Publishing: Personalized content curation and FAQ automation.
- Research: Semantic search across research papers and academic literature.
These use cases demonstrate how versatile and powerful Haystack is across different industries and applications.
Conclusion
In this tutorial, we explored what Haystack is and why it stands out as a leading RAG framework. We broke down its modular components to understand how they work together. Then, we built a simple yet smart Q&A bot using our own data, demonstrating Haystack’s practical power. Finally, we highlighted real-world applications of Haystack across various industries, showcasing its versatility and impact.
Haystack is democratizing the way we build and deploy RAG-based applications. With its intuitive design, flexible architecture, and active community, we can confidently develop smart bots that deliver real value—without reinventing the wheel.
If you want to learn more about creating AI applications using RAG, check out the Creating AI Applications using Retrieval-Augmented Generation (RAG) course on Codecademy.
Frequently asked questions
1. Is Haystack free to use?
Yes! Haystack is completely open-source under the Apache 2.0 license. You can use, modify, and even contribute to it without any licensing fees.
2. Is Haystack better than LangChain?
It depends on your use case. Haystack shines when building search-centric, RAG-based systems with production pipelines and multiple backend integrations. LangChain, on the other hand, is more focused on agentic workflows and chaining prompts. Both are excellent, and many developers use them together.
3. What is the difference between Haystack and LlamaIndex?
Haystack is a pipeline-based framework ideal for search and retrieval tasks, while LlamaIndex focuses on data indexing and querying with LLMs. LlamaIndex excels in structured data abstraction, whereas Haystack provides end-to-end Q&A and search capabilities with flexible pipeline control.
'The Codecademy Team, composed of experienced educators and tech experts, is dedicated to making tech skills accessible to all. We empower learners worldwide with expert-reviewed content that develops and enhances the technical skills needed to advance and succeed in their careers.'
Meet the full teamRelated articles
- Article
How to Build Agentic AI with LangChain and LangGraph
Learn to build AI agents with LangChain and LangGraph. Create autonomous workflows using memory, tools, and LLM orchestration. - Article
Building a Language Model Application with LangChain: A Beginners Guide
Learn about Large Language Models (LLMs) and how to build applications powered by Generative AI using LangChain. - Article
Getting Started with LangChain Prompt Templates
Learn LangChain prompt templates
Learn more on Codecademy
- Course
Creating AI Applications using Retrieval-Augmented Generation (RAG)
Learn how to give your large language model the powers of retrieval with RAG, and build a RAG app with Streamlit and ChromaDB.With CertificateIntermediate3 hours - Free course
Intro to Language Models in Python
Build the basic language models in Python.Intermediate4 hours - Free course
Generative AI & the Azure Bot AI Service
Master Azure Bot Service: Create smart chatbots with generative AI and Copilot Studio. Deploy CLU models, explore Cognitive Services and scaling, and ensure data security.Intermediate1 hour