Articles

What is LiteLLM and How to Use it

  • Explore OpenAI’s API and learn how to write more effective generative AI prompts that help improve your results.
    • Beginner Friendly.
      < 1 hour
  • Leverage the OpenAI API within your Python code. Learn to import OpenAI modules, use chat completion methods, and craft effective prompts.
    • With Certificate
    • Intermediate.
      1 hour

What is LiteLLM?

LiteLLM is an open-source Python library that acts as a unified interface for Large Language Models (LLMs). It allows us to connect with multiple AI providers such as OpenAI, Anthropic, Google Gemini, Mistral, Cohere, and even local models through Ollama using a single, standardized API.

Working with multiple LLMs results in juggling different API formats, authentication methods, and SDKs. This usually requires code rewrites, new dependencies, and manual adjustments. LiteLLM resolves this by acting as a bridge between the application and major LLM providers, letting you manage requests, responses, and errors consistently.

At its core, LiteLLM does two main things:

  • Normalizes APIs across providers: It takes a standard input from our code and adapts it automatically so that it matches the requirements of the target model.

  • Acts as a gateway or proxy: It can route and manage requests, track usage, handle errors, and centralize configuration, making multi-model workflows easier to maintain.

In short, LiteLLM consolidates the complexity of multiple LLMs into a single, manageable interface.

Now that we understand what LiteLLM does, let’s get it up and running on our machine.

How to install LiteLLM?

LiteLLM works on any machine that supports Python. Follow these steps to install LiteLLM on your device:

Step 1: Install via pip

We’ll start by installing LiteLLM using Python’s package manager (pip). Open the terminal and use this command:

pip install litellm

Note: Make sure your Python version is 3.8 or higher to avoid compatibility issues.

Step 2: Set up environment variables

LLM providers require API keys. To keep them secure, we need to set them as environment variables. Use the following commands in your terminal before running the Python script, or you can also add them to your system’s environment variables:

export OPENAI_API_KEY="your_openai_key_here"
export ANTHROPIC_API_KEY="your_anthropic_key_here"

Note: Windows users need to use the set command instead of export.

Step 3: Verify LiteLLM installation

After completing the installation and setting up the API keys, let’s check if everything is working. Run the following sample Python script in your IDE:

import os
import sys
from dotenv import load_dotenv
# Load environment variables from .env file (if using one)
load_dotenv()
# Check Python version
print(f"Python version: {sys.version_info.major}.{sys.version_info.minor}")
# Check LiteLLM installation
try:
import litellm
print("✓ LiteLLM installed")
except ImportError:
print("✗ LiteLLM not installed")
# Check API keys
openai_key = os.getenv("OPENAI_API_KEY")
anthropic_key = os.getenv("ANTHROPIC_API_KEY")
if openai_key:
print("✓ OpenAI API key found")
else:
print("✗ OpenAI API key not found")
if anthropic_key:
print("✓ Anthropic API key found")
else:
print("✗ Anthropic API key not found")

If everything is set up correctly, running the script should produce an output like:

Python version: 3.9
✓ LiteLLM installed
✓ OpenAI API key found
✓ Anthropic API key found

If an API key is not set, you’ll see:

✗ OpenAI API key not found

This verification ensures that your environment is ready to start using LiteLLM for multi-LLM workflows.

Generating code with LiteLLM

Let’s create a Python code generator using LiteLLM. The goal is to send a prompt describing a coding task to an LLM and receive working Python code as a response. Let’s go step-by-step:

Sending requests via LiteLLM

The first step in building our Code Generator is sending prompts to an LLM using LiteLLM’s unified API. A request to an LLM typically includes three key components:

  • Model: This specifies which LLM to use. For example, we can use models like “openai/gpt-4o-mini” or “anthropic/claude-3”.

  • Messages: This is a structured list that contains system and user prompts. System message sets the context, and the user message contains the specific task or query.

  • Response: LiteLLM returns a response object from which you can extract the generated output.

Here’s an example of sending a request to generate a Python function that calculates the factorial of a number:

import litellm
# Initialize LiteLLM client
client = litellm.LiteLLM()
# Prepare the prompt
messages = [
{"role": "system", "content": "You are a helpful assistant that writes Python code."},
{"role": "user", "content": "Write a Python function to calculate the factorial of a number."}
]
# Send the request
response = client.completion(model="openai/gpt-4o-mini", messages=messages)
# Access the generated code
generated_code = response.choices[0].message.content
print(generated_code)

Here:

  • LiteLLM standardizes the request format across all supported providers.
  • The model parameter is used to switch models.
  • The messages structure provides context and instructions to the LLM.

The output here will be a Python function that calculates the factorial of a number. It will look something like this (exact formatting may vary depending on the model’s response):

def factorial(n):
"""Calculate the factorial of a number recursively."""
if n == 0 or n == 1:
return 1
else:
return n * factorial(n - 1)

Or some models might generate an iterative version as well:

def factorial(n):
"""Calculate the factorial of a number iteratively."""
result = 1
for i in range(2, n + 1):
result *= i
return result

The key point is that generated_code contains the Python code as a string, which you can then execute or save to a file.

Making API calls and handling errors

LiteLLM also abstracts the complexities of working with LLM APIs such as handling retries, timeouts, and consistent error messages, so developers don’t need to write repetitive error-handling code.

Making API calls

When you call client.completion() or similar methods, LiteLLM automatically manages the connection to the provider and returns a standardized response object. You can include optional parameters such as timeout or max_retries to control behavior:

import litellm
client = litellm.LiteLLM()
try:
response = client.completion(
model="openai/gpt-4o-mini",
messages=[
{"role": "system", "content": "You are a Python coding assistant."},
{"role": "user", "content": "Write a function to reverse a string."}
],
timeout=10, # seconds
max_retries=3 # retry on temporary errors
)
print(response.choices[0].message.content)
except litellm.LiteLLMError as e:
print(f"Error: {e}")

Handling common errors

LiteLLM provides consistent error handling for scenarios like:

  • Rate limits: Automatically retry according to max_retries.

  • Missing API keys: Returns a clear message indicating which key is missing.

  • Timeouts or network issues: Throws a standardized exception you can catch.

Developers can also implement fallback logic to switch providers if one model fails. For example:

try:
response = client.completion(model="openai/gpt-4o-mini", messages=messages)
except litellm.LiteLLMError:
# Fallback to Anthropic
response = client.completion(model="anthropic/claude-3", messages=messages)

This ensures that the workflow continues even if a provider experiences downtime or rate limits.

Switching between LiteLLM models

One of LiteLLM’s key strengths is its provider-agnostic flexibility. We can route requests to different LLM providers by only changing the model parameter in your request.

An example of switching the models is:

import litellm
client = litellm.LiteLLM()
messages = [
{"role": "system", "content": "You are a Python coding assistant."},
{"role": "user", "content": "Write a function to check if a number is prime."}
]
# Using OpenAI
response_openai = client.completion(model="openai/gpt-4o-mini", messages=messages)
print("OpenAI output:\n", response_openai.choices[0].message.content)
# Switching to Anthropic
response_anthropic = client.completion(model="anthropic/claude-3", messages=messages)
print("Anthropic output:\n", response_anthropic.choices[0].message.content)
# Switching to Mistral
response_mistral = client.completion(model="mistral/mistral-7b", messages=messages)
print("Mistral output:\n", response_mistral.choices[0].message.content)

Each response prints a Python function to check if a number is prime. The exact output may vary slightly depending on the model, but it will generally look like this:

def is_prime(n):
"""Check if a number is prime."""
if n <= 1:
return False
for i in range(2, int(n**0.5) + 1):
if n % i == 0:
return False
return True

As we can see, managing multiple LLMs and providers becomes straightforward, letting you spend less time on setup and more on development. So, what makes LiteLLM stand out compared to using individual LLM SDKs?

Top features of LiteLLM

LiteLLM offers a set of core features designed to simplify working with multiple LLMs. Here are some of the core features of LiteLLM:

1. Unified API

LiteLLM provides a single syntax for all supported LLM providers, so switching models doesn’t require rewriting your code.

import litellm
client = litellm.LiteLLM()
response = client.completion(
model="openai/gpt-4o-mini",
messages=[{"role": "user", "content": "Say hello in Python"}]
)
print(response.choices[0].message.content)

2. Streaming responses

We can receive output from the LLM in real-time as tokens are generated. This is especially useful for long completions or interactive applications.

for token in client.stream(model="openai/gpt-4o-mini", messages=messages):
print(token, end="")

3. Provider switching

We can change the model provider effortlessly by updating the model parameter.

response = client.completion(model="anthropic/claude-3", messages=messages)

4. Error handling

LiteLLM standardizes error classes and structured responses for consistent exception management.

try:
response = client.completion(model="openai/gpt-4o-mini", messages=messages)
except litellm.LiteLLMError as e:
print(f"Error: {e}")

5. Fallbacks

If a model or provider fails, LiteLLM lets you automatically try an alternative provider.

try:
response = client.completion(model="openai/gpt-4o-mini", messages=messages)
except litellm.LiteLLMError:
response = client.completion(model="anthropic/claude-3", messages=messages)

6. Logging and cost tracking

LiteLLM can track usage, API calls, and associated costs. This is helpful for monitoring consumption, debugging, or reporting usage across multiple providers.

client.enable_logging(True)
print(client.usage_summary())

7. Proxy mode

LiteLLM can run as a proxy server to centralize requests. This allows teams to route all LLM requests through a single server, monitor usage, and manage API keys centrally.

client.start_proxy_server(port=8000)

Let’s examine some key benefits of LiteLLM.

Why choose LiteLLM over other LLM libraries

LiteLLM offers practical advantages that streamline development with multiple LLMs while maintaining flexibility and reliability. Its features translate into clear benefits for real-world projects.

  • Simplifies API management: Use a single API for all supported providers, eliminating the need to learn or maintain multiple SDKs and reducing errors when switching models.

  • Reduces development and maintenance overhead: Unified request/response handling and built-in error management minimize repetitive coding, letting teams focus on building features instead of debugging provider-specific code.

  • Prevents vendor lock-in: Easily switch between models or providers without rewriting code, ensuring flexibility and freedom to choose the best provider for each task.

  • Enables cost optimization: Dynamically choose between providers or models to manage API usage costs effectively, using high-end models only when necessary.

  • Easy integration: Integrates smoothly with existing AI tools, frameworks, and Python workflows, allowing seamless inclusion in projects without restructuring code.

  • Lightweight and open-source: Small, efficient, and open-source, making it suitable for experimentation, prototyping, or production. Its community-driven nature ensures continuous improvement and support.

So, in what real-world scenarios does LiteLLM deliver the most value?

Real-world use cases of LiteLLM

LiteLLM’s unified API and provider-agnostic design make it ideal for a variety of practical scenarios.

  • Multi-provider AI systems: Dynamically route prompts to different LLM providers based on task type or capability, all using the same code structure.

  • Cost optimization: Automatically select cheaper or faster models for specific tasks, reducing API costs while maintaining performance.

  • Internal LLM gateway: Deploy LiteLLM as a proxy server to centralize API traffic, manage keys, and monitor usage across teams or applications.

  • Evaluation pipelines: Compare performance of multiple models without changing your request code, making testing and benchmarking straightforward.

  • Local inference setups: Use LiteLLM with local LLM tools like Ollama, enabling on-device inference while maintaining a consistent API across local and cloud models.

These use cases demonstrate how LiteLLM streamlines multi-model workflows, minimizes operational complexity, and offers flexibility for both cloud and local LLM setups.

Conclusion

LiteLLM is an open-source Python library that unifies access to multiple LLM providers, simplifying API management, error handling, and model switching. From sending prompts and handling streaming responses to automatic fallbacks and cost tracking, LiteLLM provides a consistent workflow across cloud and local models. Its flexibility, lightweight design, and community-driven support make it ideal for both experimentation and production-ready applications.

For those interested in practical AI projects, you can explore Codecademy’s Build Chatbots with Python course.

Frequently asked questions

1. Does LiteLLM cost money?

LiteLLM itself is free and open-source. However, API usage from LLM providers like OpenAI, Anthropic, or Mistral may incur costs depending on their pricing.

2. Is LiteLLM a proxy?

Yes, LiteLLM can function as a proxy server. The LiteLLM Proxy Server acts as a centralized gateway to manage multiple LLM providers, handle authentication, track usage, and implement rate limiting across your organization.

3. What are the limitations of LiteLLM?

LiteLLM is a unifying interface, not a model itself. Limitations include dependency on the underlying providers, network access for cloud models, and potential feature differences between providers that may not be fully standardized.

4. What is the difference between LiteLLM and LangChain?

LiteLLM focuses on simplifying API access and managing multiple providers, while LangChain provides a broader framework for building complex LLM workflows, including chaining prompts, memory management, and agent-based tasks.

5. Does LiteLLM support MCP (Multi-Cloud Provider setups)?

Yes, LiteLLM’s design allows routing requests across multiple providers dynamically, making it suitable for multi-cloud or hybrid setups where you want flexibility and fallback options

Codecademy Team

'The Codecademy Team, composed of experienced educators and tech experts, is dedicated to making tech skills accessible to all. We empower learners worldwide with expert-reviewed content that develops and enhances the technical skills needed to advance and succeed in their careers.'

Meet the full team

Learn more on Codecademy

  • Explore OpenAI’s API and learn how to write more effective generative AI prompts that help improve your results.
    • Beginner Friendly.
      < 1 hour
  • Leverage the OpenAI API within your Python code. Learn to import OpenAI modules, use chat completion methods, and craft effective prompts.
    • With Certificate
    • Intermediate.
      1 hour
  • Leverage the OpenAI API within your JavaScript code. Learn to customize prompts and hyperparameters for optimal output.
    • With Certificate
    • Intermediate.
      1 hour