Articles

Build a Custom LLM-Powered Chat App Using Chainlit

Learn how to create powerful conversational AI applications using Chainlit, a framework designed specifically for building LLM chat apps with Python.

Building conversational AI applications has become increasingly accessible with the rise of large language models (LLMs). However, creating a user-friendly chat interface that can effectively interact with these models often requires significant frontend development skills. Chainlit changes this by providing a Python-first framework specifically designed for building LLM chat apps without the complexity of traditional web development.

Whether you’re a data scientist looking to prototype AI applications or a developer wanting to create production-ready chat interfaces, Chainlit offers an intuitive solution that lets you focus on the AI logic rather than the user interface complexities.

Let’s explore what makes Chainlit unique and why it’s the ideal framework for creating LLM-powered chat applications.

What is Chainlit?

Chainlit is an open-source Python framework designed specifically for building conversational AI applications and LLM chat apps. Unlike general-purpose web frameworks, Chainlit is purpose-built for creating chat interfaces that work seamlessly with large language models, making it easier to develop, test, and deploy AI-powered conversational applications.

Chainlit bridges the gap between AI model development and user-facing applications. While frameworks like LangChain help you build the AI logic, Chainlit focuses on providing the interface layer that users interact with. It automatically handles common chat application requirements like message history, real-time streaming, file uploads, and user session management.

Now that we understand its strengths, let’s get hands-on and build an LLM-powered chat app from scratch.

Related Course

Learn Intermediate Go: File Handling

Learn about handling files and directories in this Intermediate Go course. Try it for free

Building LLM app using Chainlit

In this step-by-step tutorial, you’ll build a fully functional LLM-powered chat application using Chainlit and Ollama, all running locally on your machine.

Prerequisites

Before we start building, ensure you have the following installed:

  • Python 3.8 or higher

  • Basic familiarity with Python programming

  • Sufficient disk space for AI model downloads (several GB)

Step 1: Install Ollama

First, we need to install Ollama to run LLMs locally. Ollama provides an easy way to run large language models on your own machine.

On Windows:

Download the installer from ollama.ai and follow the installation instructions.

On macOS:

# Using Homebrew
brew install ollama
# Or download from https://ollama.ai

On Linux:

curl -fsSL https://ollama.ai/install.sh | sh

Step 2: Download a language model

Once Ollama is installed, download a language model. We’ll use Llama 2, which provides good performance for chat applications:

# Pull the Llama 2 model (this may take several minutes)
ollama pull llama2
# Verify the model is available
ollama list

You should see llama2:latest in the list of available models.

Step 3: Set up your development environment

Create a new project directory and set up a virtual environment:

# Create project directory
mkdir chainlit-chat-app
cd chainlit-chat-app

Now create a virtual environment. The command varies depending on your system:

On Windows:

# Try python first
python -m venv venv

Note: If python doesn’t work, try using python3 or py in this command.

On macOS/Linux:

# Try python3 first (most common on macOS)
python3 -m venv venv
# If python3 doesn't work, try python
python -m venv venv

Activate your virtual environment:

# Activate virtual environment
# On macOS/Linux:
source venv/bin/activate
# On Windows:
venv\Scripts\activate

You should see (venv) appear at the beginning of your terminal prompt, indicating the virtual environment is active.

Step 4: Install required dependencies

Install Chainlit and the necessary packages for Ollama integration:

# Install the latest version of Chainlit
pip install chainlit>=1.0.0
# Install requests for API communication
pip install requests
# Create requirements file
pip freeze > requirements.txt

With our environment set up, we’ll now create the core functionality of our chat app, starting with basic message handling.

Step 5: Create the basic chat module

Let’s start by creating the foundational chat functionality. Create a file called basic_chat.py in your chainlit-chat-app directory:

import chainlit as cl
import requests
import json
# Ollama API configuration
OLLAMA_URL = "http://localhost:11434/api/generate"
MODEL_NAME = "llama2"
def call_ollama(message: str) -> str:
"""
Send a message to Ollama and return the response.
"""
payload = {
"model": MODEL_NAME,
"prompt": message,
"stream": False
}
try:
response = requests.post(OLLAMA_URL, json=payload)
response.raise_for_status()
result = response.json()
return result.get("response", "Sorry, I couldn't generate a response.")
except requests.exceptions.RequestException as e:
return f"Error connecting to Ollama: {str(e)}"
except json.JSONDecodeError:
return "Error: Invalid response from Ollama"
@cl.on_chat_start
async def start():
"""
Initialize the chat session.
"""
await cl.Message(
content="Hello! I'm your basic AI assistant powered by Llama 2. How can I help you today?"
).send()
@cl.on_message
async def main(message: cl.Message):
"""
Handle incoming user messages.
"""
# Show that we're processing the message
msg = cl.Message(content="")
await msg.send()
# Get response from Ollama
response = call_ollama(message.content)
# Update the message with the response
msg.content = response
await msg.update()

Code Explanation:

  • API Configuration: Sets up constants for connecting to Ollama running locally on port 11434

  • call_ollama() function: Handles the HTTP request to Ollama with proper error handling for network issues and JSON parsing

  • @cl.on_chat_start decorator: Chainlit decorator that runs when a new chat session begins, sending a welcome message

  • @cl.on_message decorator: Chainlit decorator that handles every user message, processes it through Ollama, and displays the response

  • Message handling: Creates an empty message first, then updates it with the AI response for a smooth user experience

Step 6: Create the streaming module

Now let’s create a separate module for streaming functionality. Create another file named streaming.py:

import requests
import json
# Ollama API configuration
OLLAMA_URL = "http://localhost:11434/api/generate"
MODEL_NAME = "llama2"
def call_ollama_stream(prompt: str):
"""
Send a message to Ollama and yield streaming responses.
"""
payload = {
"model": MODEL_NAME,
"prompt": prompt,
"stream": True,
"options": {
"temperature": 0.7,
"top_p": 0.9
}
}
try:
response = requests.post(OLLAMA_URL, json=payload, stream=True)
response.raise_for_status()
for line in response.iter_lines():
if line:
try:
data = json.loads(line.decode('utf-8'))
if 'response' in data:
yield data['response']
except json.JSONDecodeError:
continue
except requests.exceptions.RequestException as e:
yield f"Error connecting to Ollama: {str(e)}"
async def stream_response(message_content: str, chainlit_message):
"""
Stream a response to a Chainlit message.
"""
full_response = ""
for chunk in call_ollama_stream(message_content):
full_response += chunk
chainlit_message.content = full_response
await chainlit_message.update()
return full_response

Code Explanation:

  • Streaming API call: Uses stream=True parameter to enable real-time response streaming from Ollama

  • Generator function: call_ollama_stream() uses yield to return response chunks as they arrive, creating a more responsive experience

  • Response options: Includes temperature (creativity) and top_p (response diversity) parameters for better conversation quality

  • Line-by-line processing: Processes each line of the streaming response, handling JSON parsing errors gracefully

  • Helper function: stream_response() provides a convenient way to stream responses to Chainlit messages

Step 7: Create the history module

Create a module to handle conversation history. Create a file as history.py:

import chainlit as cl
def initialize_history():
"""
Initialize conversation history in user session.
"""
cl.user_session.set("message_history", [])
def add_message_to_history(message_type: str, content: str):
"""
Add a message to the conversation history.
"""
message_history = cl.user_session.get("message_history", [])
message_history.append({
"type": message_type,
"content": content
})
cl.user_session.set("message_history", message_history)
def build_conversation_context():
"""
Build conversation context from message history.
"""
message_history = cl.user_session.get("message_history", [])
if not message_history:
return "You are a helpful AI assistant."
context = "You are a helpful AI assistant. Here's our conversation so far:\n\n"
# Keep last 6 messages for context
for msg in message_history[-6:]:
role = "Human" if msg['type'] == 'user_message' else "Assistant"
context += f"{role}: {msg['content']}\n"
return context
def get_contextual_prompt(user_message: str):
"""
Build a prompt with conversation context.
"""
context = build_conversation_context()
return f"{context}\nHuman: {user_message}\nAssistant:"

Code Explanation:

  • Session storage: Uses Chainlit’s user_session to store conversation history per user, maintaining privacy and context

  • Message tracking: Stores both user messages and AI responses with type labels for proper context reconstruction

  • Context building: Creates a formatted conversation history that helps the AI understand previous interactions

  • Memory management: Limits context to the last 6 messages to prevent overwhelming the AI model and stay within token limits

  • Prompt formatting: Combines conversation history with the current message in a format that AI models understand well

Our chat app now has memory, but we can enhance the user experience further by adding file upload capabilities.

Step 8: Create the file handler module

Create a module for file upload functionality. Create file_handler.py:

Note: File upload functionality requires Chainlit 1.0.0+. If you have an older version, this module will provide fallback functionality.

import chainlit as cl
async def handle_file_upload(files):
"""
Handle uploaded files and store them in the session.
Compatible with Chainlit 1.0.0+
"""
for file in files:
if file.type == "text/plain":
# Read the file content
try:
content = file.content.decode('utf-8')
except AttributeError:
# Fallback for older Chainlit versions
with open(file.path, 'r', encoding='utf-8') as f:
content = f.read()
# Store file content in user session
cl.user_session.set("uploaded_file", {
"name": file.name,
"content": content
})
await cl.Message(
content=f"📄 **File uploaded successfully!**\n\nFile: `{file.name}`\nSize: {len(content)} characters\n\nYou can now ask me questions about this file's content!"
).send()
else:
await cl.Message(
content=f"❌ Sorry, I can only process text files (.txt). The file `{file.name}` is not supported."
).send()
def get_file_context():
"""
Get uploaded file context for the prompt.
"""
uploaded_file = cl.user_session.get("uploaded_file")
if uploaded_file:
return f"\n\nAdditionally, the user has uploaded a file named '{uploaded_file['name']}' with the following content:\n\n{uploaded_file['content'][:3000]}..."
return ""
def has_uploaded_file():
"""
Check if user has uploaded a file.
"""
return cl.user_session.get("uploaded_file") is not None
# Version compatibility check
def supports_file_upload():
"""
Check if the current Chainlit version supports file uploads.
"""
try:
import chainlit
version = chainlit.__version__
major, minor = map(int, version.split('.')[:2])
return major >= 1 and minor >= 0
except:
return False

Code Explanation:

  • File validation: Checks file type to ensure only text files are processed, preventing errors with binary files

  • Version compatibility: Handles different Chainlit versions gracefully with fallback methods for file reading

  • Content storage: Stores file content in the user session, making it available throughout the conversation

  • Context integration: Provides functions to check for uploaded files and include their content in AI prompts

  • Content truncation: Limits file content to 3000 characters to prevent exceeding AI model context limits

  • User feedback: Provides clear success and error messages to guide users on supported file types

We’ve built all the individual components. Now it’s time to bring everything together into a cohesive application.

Step 9: Create the main application

Now let’s combine all modules into our main application. Create a new file called app.py in your chainlit-chat-app directory:

Make sure you have created all the previous module files (streaming.py, history.py, file_handler.py) before creating this main application file.

import chainlit as cl
from streaming import call_ollama_stream
from history import initialize_history, add_message_to_history, get_contextual_prompt
from file_handler import handle_file_upload, get_file_context, has_uploaded_file, supports_file_upload
@cl.on_chat_start
async def start():
"""
Initialize the chat session.
"""
# Initialize conversation history
initialize_history()
# Check if file upload is supported
file_upload_info = ""
if supports_file_upload():
file_upload_info = "• **File upload support** for text files\n"
else:
file_upload_info = "• File upload not available (requires Chainlit 1.0.0+)\n"
welcome_msg = f"""🤖 **Welcome to your Complete AI Chat Assistant!**
I'm powered by Llama 2 running locally through Ollama. Here's what I can do:
**Core Features:**
• **Real-time streaming** responses
• **Conversation memory** and context
{file_upload_info}• **Complete privacy** - everything runs locally
**Capabilities:**
• Answer questions on any topic
• Help with creative writing and brainstorming
• Analyze and explain complex concepts
• Process uploaded text files (if supported)
• Provide coding help and technical guidance
Feel free to ask me anything!"""
await cl.Message(content=welcome_msg).send()
# Only add file upload handler if supported
if supports_file_upload():
@cl.on_file_upload
async def on_file_upload(files):
"""
Handle file uploads using the file_handler module.
Only available in Chainlit 1.0.0+
"""
await handle_file_upload(files)
@cl.on_message
async def main(message: cl.Message):
"""
Handle incoming user messages with all features combined.
"""
# Add user message to history
add_message_to_history("user_message", message.content)
# Build prompt with conversation context
prompt = get_contextual_prompt(message.content)
# Add file context if available
if has_uploaded_file():
file_context = get_file_context()
uploaded_file = cl.user_session.get("uploaded_file")
prompt = prompt.replace("Human:", f"Human (with uploaded file '{uploaded_file['name']}'):")
prompt = prompt.replace("\nAssistant:", f"{file_context}\n\nAssistant:")
# Create message for streaming response
msg = cl.Message(content="")
await msg.send()
# Stream the response
full_response = ""
for chunk in call_ollama_stream(prompt):
full_response += chunk
msg.content = full_response
await msg.update()
# Add assistant response to history
add_message_to_history("assistant_message", full_response)

Code Explanation:

  • Module imports: Imports functionality from our custom modules, demonstrating modular architecture

  • Dynamic feature detection: Checks Chainlit version compatibility and adjusts features accordingly

  • Conditional decorators: Uses if supports_file_upload() to conditionally add file upload handling

  • Orchestration logic: Combines conversation history, file context, and streaming responses seamlessly

  • Context management: Intelligently merges file content with conversation history for comprehensive AI context

  • Error resilience: Gracefully handles missing features and version incompatibilities

Step 10: Run your application

Now let’s test our chat application.

  1. Start Ollama (if not already running):
ollama serve
  1. Run the application:
chainlit run app.py -w

Chainlit chat app welcome screen showing AI assistant features: real-time streaming, conversation memory, and local privacy. Text input and file upload controls at bottom.

Congratulations! You’ve built a fully functional chat app. But how does Chainlit compare to other options you might consider?

Chainlit vs other frameworks

Here’s a quick comparison to help you understand when to choose Chainlit for your projects.

Framework Best for Chat features Learning curve
Chainlit Conversational AI apps Native streaming, history, file uploads Low (for chat apps)
Streamlit Data dashboards Basic (custom required) Low (for data apps)
Gradio ML model demos Simple input/output Very Low
Flask/FastAPI Custom web apps Manual implementation High

When to choose Chainlit

Choose Chainlit when you need:

  • Real-time streaming chat responses

  • Built-in conversation history

  • File uploads in chat context

  • Quick development of conversational AI apps

Choose alternatives when:

  • Streamlit: Building data dashboards or analytics tools

  • Gradio: Creating quick ML model demonstrations

  • Flask/FastAPI: Need complete control over application architecture

Now let’s dive deeper into Chainlit’s key components to help you build more sophisticated applications.

Key components of Chainlit

Each component serves a specific purpose in creating conversational AI experiences.

Messages and responses

The foundation of any chat application is message handling. Chainlit provides several message types to create rich conversational experiences:

Text messages: The most basic form of communication, supporting markdown formatting for rich text display.

Streaming messages: Allow real-time display of AI responses as they’re generated, creating a more natural conversational flow.

System messages: Special messages that provide context or instructions without appearing as regular conversation.

User input handling

Chainlit offers various ways to capture user input beyond simple text messages:

Text input: Standard chat input with support for multiline messages and markdown.

File uploads: Built-in support for handling document uploads, images, and other file types that can be processed by your AI model.

Actions and buttons: Interactive elements that allow users to trigger specific functions or workflows.

Session management

Effective session management is crucial for maintaining conversation context:

User sessions: Automatic handling of individual user sessions with persistent storage options.

Conversation history: Built-in message history that maintains context across the conversation.

State management: Tools for maintaining application state between messages and user interactions.

Decorators and hooks

Chainlit uses Python decorators to define application behavior:

@cl.on_message: Handles incoming user messages and defines response logic.

@cl.on_chat_start: Executes when a new chat session begins, useful for initialization.

@cl.on_file_upload: Processes uploaded files and integrates them into the conversation.

Integration components

Chainlit provides seamless integration with popular AI frameworks:

LangChain Integration: Native support for LangChain chains, agents, and tools.

Custom LLM Integration: Flexible architecture for integrating any LLM or AI service.

Tool integration: Support for connecting external tools and APIs to your chat application.

Conclusion

Chainlit provides an excellent foundation for building LLM-powered chat applications with minimal complexity. By combining Chainlit’s intuitive framework with Ollama’s local model execution, you can create powerful conversational AI applications that prioritize privacy and control.

Ready to build more advanced applications? Explore our Build Chatbots with Python course to deepen your programming skills and create even more powerful AI solutions.

Frequently asked questions

1. What is Chainlit used for?

Chainlit is used for building conversational AI applications and LLM chat interfaces. Common use cases include creating AI-powered chatbots, building document analysis tools, prototyping conversational AI features, and developing customer support applications.

2. Is Chainlit better than Streamlit?

Chainlit and Streamlit serve different purposes. Chainlit is ideal for building chat applications, real-time message streaming, and conversational AI interfaces. Streamlit excels at creating data dashboards, building ML model demos, and developing analytical applications. Choose Chainlit for chat interfaces and Streamlit for data visualization and general web apps.

3. What is function calling in Chainlit?

Function calling allows your chat app to integrate external tools and APIs. You can define custom functions for tasks like web searches, database queries, or API calls. Using Chainlit’s tool integration with LangChain agents, your AI can perform actions like checking the weather, sending emails, or querying databases based on user requests.

4. What are the limitations of Chainlit?

Chainlit has limited UI customization compared to building custom React applications. It’s better suited for chat-based interactions than complex multi-step workflows, and designed more for prototyping and medium-scale applications than large enterprise deployments. There’s also a learning curve that requires understanding both Chainlit and LLM integration concepts. Many of these limitations can be addressed through custom components and integrations.

5. How do I turn off telemetry in Chainlit?

To disable telemetry in Chainlit, set CHAINLIT_TELEMETRY=false in your environment or .env file, add telemetry = false to your Chainlit configuration file, or use the --no-telemetry flag when running your application. You can verify telemetry is disabled by checking your application logs.

Codecademy Team

'The Codecademy Team, composed of experienced educators and tech experts, is dedicated to making tech skills accessible to all. We empower learners worldwide with expert-reviewed content that develops and enhances the technical skills needed to advance and succeed in their careers.'

Meet the full team