What is LiteLLM and How to Use it
What is LiteLLM?
LiteLLM is an open-source Python library that acts as a unified interface for Large Language Models (LLMs). It allows us to connect with multiple AI providers such as OpenAI, Anthropic, Google Gemini, Mistral, Cohere, and even local models through Ollama using a single, standardized API.
Working with multiple LLMs results in juggling different API formats, authentication methods, and SDKs. This usually requires code rewrites, new dependencies, and manual adjustments. LiteLLM resolves this by acting as a bridge between the application and major LLM providers, letting you manage requests, responses, and errors consistently.
At its core, LiteLLM does two main things:
Normalizes APIs across providers: It takes a standard input from our code and adapts it automatically so that it matches the requirements of the target model.
Acts as a gateway or proxy: It can route and manage requests, track usage, handle errors, and centralize configuration, making multi-model workflows easier to maintain.
In short, LiteLLM consolidates the complexity of multiple LLMs into a single, manageable interface.
Now that we understand what LiteLLM does, let’s get it up and running on our machine.
How to install LiteLLM?
LiteLLM works on any machine that supports Python. Follow these steps to install LiteLLM on your device:
Step 1: Install via pip
We’ll start by installing LiteLLM using Python’s package manager (pip). Open the terminal and use this command:
pip install litellm
Note: Make sure your Python version is 3.8 or higher to avoid compatibility issues.
Step 2: Set up environment variables
LLM providers require API keys. To keep them secure, we need to set them as environment variables. Use the following commands in your terminal before running the Python script, or you can also add them to your system’s environment variables:
export OPENAI_API_KEY="your_openai_key_here"export ANTHROPIC_API_KEY="your_anthropic_key_here"
Note: Windows users need to use the
setcommand instead ofexport.
Step 3: Verify LiteLLM installation
After completing the installation and setting up the API keys, let’s check if everything is working. Run the following sample Python script in your IDE:
import osimport sysfrom dotenv import load_dotenv# Load environment variables from .env file (if using one)load_dotenv()# Check Python versionprint(f"Python version: {sys.version_info.major}.{sys.version_info.minor}")# Check LiteLLM installationtry:import litellmprint("✓ LiteLLM installed")except ImportError:print("✗ LiteLLM not installed")# Check API keysopenai_key = os.getenv("OPENAI_API_KEY")anthropic_key = os.getenv("ANTHROPIC_API_KEY")if openai_key:print("✓ OpenAI API key found")else:print("✗ OpenAI API key not found")if anthropic_key:print("✓ Anthropic API key found")else:print("✗ Anthropic API key not found")
If everything is set up correctly, running the script should produce an output like:
Python version: 3.9✓ LiteLLM installed✓ OpenAI API key found✓ Anthropic API key found
If an API key is not set, you’ll see:
✗ OpenAI API key not found
This verification ensures that your environment is ready to start using LiteLLM for multi-LLM workflows.
Generating code with LiteLLM
Let’s create a Python code generator using LiteLLM. The goal is to send a prompt describing a coding task to an LLM and receive working Python code as a response. Let’s go step-by-step:
Sending requests via LiteLLM
The first step in building our Code Generator is sending prompts to an LLM using LiteLLM’s unified API. A request to an LLM typically includes three key components:
Model: This specifies which LLM to use. For example, we can use models like “openai/gpt-4o-mini” or “anthropic/claude-3”.
Messages: This is a structured list that contains system and user prompts. System message sets the context, and the user message contains the specific task or query.
Response: LiteLLM returns a response object from which you can extract the generated output.
Here’s an example of sending a request to generate a Python function that calculates the factorial of a number:
import litellm# Initialize LiteLLM clientclient = litellm.LiteLLM()# Prepare the promptmessages = [{"role": "system", "content": "You are a helpful assistant that writes Python code."},{"role": "user", "content": "Write a Python function to calculate the factorial of a number."}]# Send the requestresponse = client.completion(model="openai/gpt-4o-mini", messages=messages)# Access the generated codegenerated_code = response.choices[0].message.contentprint(generated_code)
Here:
- LiteLLM standardizes the request format across all supported providers.
- The
modelparameter is used to switch models. - The
messagesstructure provides context and instructions to the LLM.
The output here will be a Python function that calculates the factorial of a number. It will look something like this (exact formatting may vary depending on the model’s response):
def factorial(n):"""Calculate the factorial of a number recursively."""if n == 0 or n == 1:return 1else:return n * factorial(n - 1)
Or some models might generate an iterative version as well:
def factorial(n):"""Calculate the factorial of a number iteratively."""result = 1for i in range(2, n + 1):result *= ireturn result
The key point is that
generated_codecontains the Python code as a string, which you can then execute or save to a file.
Making API calls and handling errors
LiteLLM also abstracts the complexities of working with LLM APIs such as handling retries, timeouts, and consistent error messages, so developers don’t need to write repetitive error-handling code.
Making API calls
When you call client.completion() or similar methods, LiteLLM automatically manages the connection to the provider and returns a standardized response object. You can include optional parameters such as timeout or max_retries to control behavior:
import litellmclient = litellm.LiteLLM()try:response = client.completion(model="openai/gpt-4o-mini",messages=[{"role": "system", "content": "You are a Python coding assistant."},{"role": "user", "content": "Write a function to reverse a string."}],timeout=10, # secondsmax_retries=3 # retry on temporary errors)print(response.choices[0].message.content)except litellm.LiteLLMError as e:print(f"Error: {e}")
Handling common errors
LiteLLM provides consistent error handling for scenarios like:
Rate limits: Automatically retry according to max_retries.
Missing API keys: Returns a clear message indicating which key is missing.
Timeouts or network issues: Throws a standardized exception you can catch.
Developers can also implement fallback logic to switch providers if one model fails. For example:
try:response = client.completion(model="openai/gpt-4o-mini", messages=messages)except litellm.LiteLLMError:# Fallback to Anthropicresponse = client.completion(model="anthropic/claude-3", messages=messages)
This ensures that the workflow continues even if a provider experiences downtime or rate limits.
Switching between LiteLLM models
One of LiteLLM’s key strengths is its provider-agnostic flexibility. We can route requests to different LLM providers by only changing the model parameter in your request.
An example of switching the models is:
import litellmclient = litellm.LiteLLM()messages = [{"role": "system", "content": "You are a Python coding assistant."},{"role": "user", "content": "Write a function to check if a number is prime."}]# Using OpenAIresponse_openai = client.completion(model="openai/gpt-4o-mini", messages=messages)print("OpenAI output:\n", response_openai.choices[0].message.content)# Switching to Anthropicresponse_anthropic = client.completion(model="anthropic/claude-3", messages=messages)print("Anthropic output:\n", response_anthropic.choices[0].message.content)# Switching to Mistralresponse_mistral = client.completion(model="mistral/mistral-7b", messages=messages)print("Mistral output:\n", response_mistral.choices[0].message.content)
Each response prints a Python function to check if a number is prime. The exact output may vary slightly depending on the model, but it will generally look like this:
def is_prime(n):"""Check if a number is prime."""if n <= 1:return Falsefor i in range(2, int(n**0.5) + 1):if n % i == 0:return Falsereturn True
As we can see, managing multiple LLMs and providers becomes straightforward, letting you spend less time on setup and more on development. So, what makes LiteLLM stand out compared to using individual LLM SDKs?
Top features of LiteLLM
LiteLLM offers a set of core features designed to simplify working with multiple LLMs. Here are some of the core features of LiteLLM:
1. Unified API
LiteLLM provides a single syntax for all supported LLM providers, so switching models doesn’t require rewriting your code.
import litellmclient = litellm.LiteLLM()response = client.completion(model="openai/gpt-4o-mini",messages=[{"role": "user", "content": "Say hello in Python"}])print(response.choices[0].message.content)
2. Streaming responses
We can receive output from the LLM in real-time as tokens are generated. This is especially useful for long completions or interactive applications.
for token in client.stream(model="openai/gpt-4o-mini", messages=messages):print(token, end="")
3. Provider switching
We can change the model provider effortlessly by updating the model parameter.
response = client.completion(model="anthropic/claude-3", messages=messages)
4. Error handling
LiteLLM standardizes error classes and structured responses for consistent exception management.
try:response = client.completion(model="openai/gpt-4o-mini", messages=messages)except litellm.LiteLLMError as e:print(f"Error: {e}")
5. Fallbacks
If a model or provider fails, LiteLLM lets you automatically try an alternative provider.
try:response = client.completion(model="openai/gpt-4o-mini", messages=messages)except litellm.LiteLLMError:response = client.completion(model="anthropic/claude-3", messages=messages)
6. Logging and cost tracking
LiteLLM can track usage, API calls, and associated costs. This is helpful for monitoring consumption, debugging, or reporting usage across multiple providers.
client.enable_logging(True)print(client.usage_summary())
7. Proxy mode
LiteLLM can run as a proxy server to centralize requests. This allows teams to route all LLM requests through a single server, monitor usage, and manage API keys centrally.
client.start_proxy_server(port=8000)
Let’s examine some key benefits of LiteLLM.
Why choose LiteLLM over other LLM libraries
LiteLLM offers practical advantages that streamline development with multiple LLMs while maintaining flexibility and reliability. Its features translate into clear benefits for real-world projects.
Simplifies API management: Use a single API for all supported providers, eliminating the need to learn or maintain multiple SDKs and reducing errors when switching models.
Reduces development and maintenance overhead: Unified request/response handling and built-in error management minimize repetitive coding, letting teams focus on building features instead of debugging provider-specific code.
Prevents vendor lock-in: Easily switch between models or providers without rewriting code, ensuring flexibility and freedom to choose the best provider for each task.
Enables cost optimization: Dynamically choose between providers or models to manage API usage costs effectively, using high-end models only when necessary.
Easy integration: Integrates smoothly with existing AI tools, frameworks, and Python workflows, allowing seamless inclusion in projects without restructuring code.
Lightweight and open-source: Small, efficient, and open-source, making it suitable for experimentation, prototyping, or production. Its community-driven nature ensures continuous improvement and support.
So, in what real-world scenarios does LiteLLM deliver the most value?
Real-world use cases of LiteLLM
LiteLLM’s unified API and provider-agnostic design make it ideal for a variety of practical scenarios.
Multi-provider AI systems: Dynamically route prompts to different LLM providers based on task type or capability, all using the same code structure.
Cost optimization: Automatically select cheaper or faster models for specific tasks, reducing API costs while maintaining performance.
Internal LLM gateway: Deploy LiteLLM as a proxy server to centralize API traffic, manage keys, and monitor usage across teams or applications.
Evaluation pipelines: Compare performance of multiple models without changing your request code, making testing and benchmarking straightforward.
Local inference setups: Use LiteLLM with local LLM tools like Ollama, enabling on-device inference while maintaining a consistent API across local and cloud models.
These use cases demonstrate how LiteLLM streamlines multi-model workflows, minimizes operational complexity, and offers flexibility for both cloud and local LLM setups.
Conclusion
LiteLLM is an open-source Python library that unifies access to multiple LLM providers, simplifying API management, error handling, and model switching. From sending prompts and handling streaming responses to automatic fallbacks and cost tracking, LiteLLM provides a consistent workflow across cloud and local models. Its flexibility, lightweight design, and community-driven support make it ideal for both experimentation and production-ready applications.
For those interested in practical AI projects, you can explore Codecademy’s Build Chatbots with Python course.
Frequently asked questions
1. Does LiteLLM cost money?
LiteLLM itself is free and open-source. However, API usage from LLM providers like OpenAI, Anthropic, or Mistral may incur costs depending on their pricing.
2. Is LiteLLM a proxy?
Yes, LiteLLM can function as a proxy server. The LiteLLM Proxy Server acts as a centralized gateway to manage multiple LLM providers, handle authentication, track usage, and implement rate limiting across your organization.
3. What are the limitations of LiteLLM?
LiteLLM is a unifying interface, not a model itself. Limitations include dependency on the underlying providers, network access for cloud models, and potential feature differences between providers that may not be fully standardized.
4. What is the difference between LiteLLM and LangChain?
LiteLLM focuses on simplifying API access and managing multiple providers, while LangChain provides a broader framework for building complex LLM workflows, including chaining prompts, memory management, and agent-based tasks.
5. Does LiteLLM support MCP (Multi-Cloud Provider setups)?
Yes, LiteLLM’s design allows routing requests across multiple providers dynamically, making it suitable for multi-cloud or hybrid setups where you want flexibility and fallback options
'The Codecademy Team, composed of experienced educators and tech experts, is dedicated to making tech skills accessible to all. We empower learners worldwide with expert-reviewed content that develops and enhances the technical skills needed to advance and succeed in their careers.'
Meet the full teamRelated articles
- Article
How to Create and Use an OpenAI ChatGPT API Key
Learn what an API key is and how to create your OpenAI ChatGPT API key with this step-by-step guide, which includes code examples and security tips. - Article
What is OpenRouter? A Guide with Practical Examples
Learn what OpenRouter is and how it unifies 400+ AI models through one API, with setup guidance and practical examples. - Article
Getting Started with OpenAI Models
Learn how to use an OpenAI model and fine-tune it tailoring it to better fit tasks or topics.
Learn more on Codecademy
- Explore OpenAI’s API and learn how to write more effective generative AI prompts that help improve your results.
- Beginner Friendly.< 1 hour
- Leverage the OpenAI API within your Python code. Learn to import OpenAI modules, use chat completion methods, and craft effective prompts.
- With Certificate
- Intermediate.1 hour
- Leverage the OpenAI API within your JavaScript code. Learn to customize prompts and hyperparameters for optimal output.
- With Certificate
- Intermediate.1 hour