Articles

What is DSPy? Build a Text-to-SQL App with Python

Building applications with LLMs feels like an art form. We spend hours crafting the perfect prompt, tweaking the words, but it breaks when we switch models or update our use case. This trial-and-error manual approach to prompt engineering is extensive and difficult to scale. DSPy changes this paradigm entirely.

In this article, we’ll understand what DSPy means and build a practical Text-to-SQL application to see how it works.

  • Learn to analyze data with SQL and prepare for technical interviews.
    • Includes 9 Courses
    • With Certificate
    • Beginner Friendly.
      17 hours
  • In this SQL course, you'll learn how to manage large datasets and analyze real data using the standard data management language.
    • Beginner Friendly.
      5 hours

What is DSPy?

DSPy, short for Declarative Self-improving Python, is an open-source framework developed by Stanford University’s NLP group. DSPy allows developers to build large language model applications using compositional Python code instead of crafting the prompts manually.

In DSPy, you declare what your model should do using signatures (input-output specifications), and choose from reusable modules that define reasoning strategies like Chain-of-Thought. DSPy’s compilers and optimizers then automatically generate, evaluate, and refine the underlying prompts based on data or feedback. This DSPy approach turns prompt engineering into a structured, learnable process instead of manual trial-and-error.

DSPy sits between your data and the language model. You still end up using providers like OpenAI or Anthropic, DSPy just manages how those models are orchestrated, optimized, and combined.

So, rather than telling the model exactly what to say, you design a system that learns how to say it better over time.

To see how DSPy works in practice, let’s build a Text-to-SQL converter that learns and improves from examples.

Building a Text-to-SQL app with DSPy

Let’s build a Text-to-SQL system that turns natural language questions into SQL queries using DSPy’s declarative modules and optimization features. We’ll follow the following steps:

Step 1: Installing DSPy

To start building our DSPy Text-to-SQL system, we can install DSPy directly from PyPI:

pip install dspy-ai

To verify the installation, check the version:

pip show dspy-ai

If the package details appear without errors, DSPy is successfully installed and ready to use.

In case of dependency errors, upgrading pip and reinstalling usually fixes it:

pip install --upgrade pip
pip install --upgrade dspy-ai

The setup is complete, now let’s begin building our first Text-to-SQL module using DSPy.

Step 2: Setting up Ollama and connecting to DSPy

DSPy can work with various language models. We’ll use Ollama, which lets us run models locally without API costs. First, install Ollama from its website then pull a model:

ollama pull mistral

Note: We will be using the mistral model, you can make use of any model of your choice.

Next, let’s configure DSPy to connect to our local Ollama instance. Create a function that initializes the connection in a main.py file:

import dspy
import requests
def setup_ollama():
"""Configure DSPy to use Ollama."""
try:
requests.get("http://localhost:11434/api/tags")
print("Ollama is running")
except:
print("Ollama not running. Start with: ollama serve")
return False
dspy.settings.configure(
lm=dspy.LM(
model="ollama/mistral",
api_base="http://localhost:11434",
max_tokens=500,
temperature=0.7,
)
)
print("Connected to Ollama (Mistral)")
return True

This function checks if Ollama is running and configures DSPy to use the Mistral model via the local API endpoint.

Step 3: Creating a sample database

We need a database to query. Let’s create a simple company database with employees, departments, and projects:

import sqlite3
def create_sample_database():
"""Create and populate sample database."""
conn = sqlite3.connect("company.db")
c = conn.cursor()
# Create tables
c.execute("""
CREATE TABLE IF NOT EXISTS employees (
id INTEGER PRIMARY KEY, name TEXT, department TEXT,
salary INTEGER, hire_date TEXT
)
""")
c.execute("""
CREATE TABLE IF NOT EXISTS departments (
id INTEGER PRIMARY KEY, name TEXT,
budget INTEGER, manager TEXT
)
""")
c.execute("""
CREATE TABLE IF NOT EXISTS projects (
id INTEGER PRIMARY KEY, name TEXT, department_id INTEGER,
budget INTEGER, status TEXT
)
""")
# Insert data
c.executemany("INSERT OR IGNORE INTO employees VALUES (?, ?, ?, ?, ?)", [
(1, "Alice Johnson", "Engineering", 95000, "2022-01-15"),
(2, "Bob Smith", "Sales", 75000, "2021-06-20"),
(3, "Carol White", "Engineering", 90000, "2023-02-10"),
(4, "David Brown", "Marketing", 70000, "2022-09-05"),
(5, "Eve Davis", "Engineering", 98000, "2020-03-12"),
])
c.executemany("INSERT OR IGNORE INTO departments VALUES (?, ?, ?, ?)", [
(1, "Engineering", 500000, "Alice Johnson"),
(2, "Sales", 300000, "Bob Smith"),
(3, "Marketing", 200000, "Carol White"),
])
c.executemany("INSERT OR IGNORE INTO projects VALUES (?, ?, ?, ?, ?)", [
(1, "API Development", 1, 150000, "In Progress"),
(2, "Mobile App", 1, 200000, "Planning"),
(3, "Sales Dashboard", 2, 80000, "Complete"),
])
conn.commit()
conn.close()
print("Database created")

This creates three tables with realistic data that we can query using natural language.

Step 4: Defining DSPy signatures

DSPy uses signatures to declare the input-output behavior of your modules. A signature tells DSPy what inputs to expect and what outputs to generate. Let’s define a DSPy signature for Text-to- SQL generation:

class GenerateSQL(dspy.Signature):
"""Generate SQL from natural language.
Database schema:
- employees: id, name, department, salary, hire_date
- departments: id, name, budget, manager
- projects: id, name, department_id, budget, status
"""
question = dspy.InputField(desc="Natural language question")
sql_query = dspy.OutputField(desc="Valid SQL query")

The docstring provides context about the database schema that DSPy will include this in the prompt sent to the model. The InputField and OutputField definitions tell DSPy what to expect.

Step 5: Building the SQL generator module

Now let’s create a DSPy module that combines SQL generation with error refinement. If a query fails, the module automatically attempts to fix it:

class SQLGenerator(dspy.Module):
"""Generate and execute SQL queries with error refinement."""
def __init__(self):
super().__init__()
self.generator = dspy.ChainOfThought(GenerateSQL)
self.refiner = dspy.ChainOfThought(
"question, sql_query, error -> refined_sql"
)
def forward(self, question, conn):
# Generate SQL
output = self.generator(question=question)
sql = output.sql_query.strip()
# Try to execute
try:
results = conn.execute(sql).fetchall()
return dspy.Prediction(
sql_query=sql, results=results, error=None
)
except Exception as e:
# Refine on error
refined = self.refiner(
question=question, sql_query=sql, error=str(e)
)
try:
results = conn.execute(refined.refined_sql).fetchall()
return dspy.Prediction(
sql_query=refined.refined_sql, results=results, error=None
)
except Exception as e2:
return dspy.Prediction(
sql_query=refined.refined_sql, results=None, error=str(e2)
)

This module uses ChainOfThought, to show its reasoning before producing output. The forward method orchestrates the generation and refinement process. If the initial query fails, it passes the error message to the refiner, which generates a corrected query.

Step 6: Creating few-shots examples for DSPy

DSPy’s strength lies in its ability to learn from examples. Let’s define some training examples that show the model how to translate questions into SQL:

def create_examples():
"""Training examples for few-shot learning."""
return [
dspy.Example(
question="How many employees are in the Engineering department?",
sql_query="SELECT COUNT(*) FROM employees WHERE department = 'Engineering'"
).with_inputs("question"),
dspy.Example(
question="What is the highest salary in Engineering?",
sql_query="SELECT MAX(salary) FROM employees WHERE department = 'Engineering'"
).with_inputs("question"),
dspy.Example(
question="List all employees with salary above 90000",
sql_query="SELECT name, salary FROM employees WHERE salary > 90000"
).with_inputs("question"),
dspy.Example(
question="Which department has the highest budget?",
sql_query="SELECT name, budget FROM departments ORDER BY budget DESC LIMIT 1"
).with_inputs("question"),
dspy.Example(
question="Show all projects in progress",
sql_query="SELECT name, budget, status FROM projects WHERE status = 'In Progress'"
).with_inputs("question"),
]

Each dspy.Example pairs a natural language question with its corresponding SQL query. The .with_inputs("question") method tells DSPy that “question” is the input field. DSPy will include these examples in the prompt to guide the model’s behavior through in-context learning.

Step 7: Configuring the module with few-shot examples

Now let’s tie everything together. We instantiate the SQL generator and populate it with our examples:

# Setup generator with few-shot examples
generator = SQLGenerator()
generator.generator.demos = create_examples()
print("✓ Generator configured with few-shot examples")

Here, we’re configuring DSPy to include these examples as demonstrations in every prompt. The DSPy framework ensures the model sees these patterns and applies them to new Text-to-SQL questions.

Step 8: Building the main application

Let’s create the complete main function that brings everything together:

def main():
print("\n" + "="*60)
print("Text-to-SQL with DSPy & Ollama")
print("="*60 + "\n")
# Initialize Ollama connection
setup_ollama()
# Create database with sample data
create_sample_database()
# Setup generator with few-shot examples
generator = SQLGenerator()
generator.generator.demos = create_examples()
print("✓ Generator configured with few-shot examples\n")
# Connect to database
conn = sqlite3.connect("company.db")
# Define test queries
test_questions = [
"How many employees work in Engineering?",
"What is the average salary of all employees?",
"Show me all projects with a budget greater than 100000",
"Which employees have been hired since 2022?",
"List all managers and their departments",
]
print("="*60)
print("TESTING TEXT-TO-SQL QUERIES")
print("="*60 + "\n")
# Run queries
for i, question in enumerate(test_questions, 1):
print(f"Query {i}: {question}")
result = generator.forward(question, conn)
print(f"SQL: {result.sql_query}")
print(f"Results: {result.results if not result.error else f' {result.error}'}\n")
conn.close()
print(" Complete!")
if __name__ == "__main__":
main()

This main function orchestrates all the steps: setting up Ollama, creating the database, configuring the generator, and running test queries.

Step 9: Running the application

Before running, ensure Ollama is started in CPU mode:

set OLLAMA_NUM_GPU=0
ollama serve

In another terminal, run the application:

python main.py

You should see an output like:

============================================================
Text-to-SQL with DSPy & Ollama
============================================================
Ollama is running
Connected to Ollama (Mistral)
Database created
Generator configured with few-shot examples
============================================================
TESTING TEXT-TO-SQL QUERIES
============================================================
Query 1: How many employees work in Engineering?
2025/10/29 20:34:06 WARNING dspy.primitives.module: Calling module.forward(...) on SQLGenerator directly is discouraged. Please use module(...) instead.
SQL: SELECT COUNT(*) FROM employees WHERE department = 'Engineering';
Results: [(3,)]
Query 2: What is the average salary of all employees?
2025/10/29 20:34:38 WARNING dspy.primitives.module: Calling module.forward(...) on SQLGenerator directly is discouraged. Please use module(...) instead.
SQL: SELECT AVG(salary) as average_salary FROM employees;
Results: [(85600.0,)]
Query 3: Show me all projects with a budget greater than 100000
2025/10/29 20:34:51 WARNING dspy.primitives.module: Calling module.forward(...) on SQLGenerator directly is discouraged. Please use module(...) instead.
SQL: SELECT * FROM projects WHERE budget > 100000;
Results: [(1, 'API Development', 1, 150000, 'In Progress'), (2, 'Mobile App', 1, 200000, 'Planning')]
Query 4: Which employees have been hired since 2022?
2025/10/29 20:35:00 WARNING dspy.primitives.module: Calling module.forward(...) on SQLGenerator directly is discouraged. Please use module(...) instead.
SQL: SELECT id, name, department, salary, hire_date FROM employees WHERE hire_date > '2022-01-01';
Results: [(1, 'Alice Johnson', 'Engineering', 95000, '2022-01-15'), (3, 'Carol White', 'Engineering', 90000, '2023-02-10'), (4, 'David Brown', 'Marketing', 70000, '2022-09-05')]
Query 5: List all managers and their departments
2025/10/29 20:35:12 WARNING dspy.primitives.module: Calling module.forward(...) on SQLGenerator directly is discouraged. Please use module(...) instead.
SQL: SELECT managers.name AS ManagerName, departments.name AS DepartmentName
FROM departments AS managers
JOIN departments ON managers.id = departments.manager;
Results: []
Complete!

Your system can now convert natural language questions into SQL queries, execute them, and refine the results if errors occur.

We’ve seen DSPy in action, but how does it actually work under the hood?

How does DSPy work?

At its core, DSPy separates two critical concerns: declaring what you want (your program logic) and optimizing how to get it (the prompts and model parameters). This means youwe can change models, improve prompts, or adjust strategies without rewriting the entire application.

DSPy’s architecture revolves around three interconnected layers:

  1. Declarative layer: The declarative layer is where we specify what the program should accomplish without dictating how. This layer includes:
  • Signatures that define the input-output contract of the tasks. For example, GenerateSQL specifies: “Given a question, generate a SQL query.” Signatures are purely declarative as they describe structure without prescribing prompts.

  • Modules that are composable building blocks that wrap LLM tasks. A module uses one or more signatures and orchestrates their execution. In our example, SQLGenerator is a module that uses GenerateSQL (for queries) and a refinement signature (to fix errors).

  1. Compiler layer: The compiler layer transforms your declarations into executable instructions that interact with language models. This layer includes:
  • Primitives are the actual functions that communicate with LLMs. Predict takes inputs and generates outputs using the configured model. ChainOfThought enhances Predict by asking the model to show step-by-step reasoning before producing the final answer. Other primitives like MultiChainComparison handle multi-step reasoning tasks. Primitives bridge your declarative modules to real LLM calls.
  1. Learning layer: The learning layer uses optimizers to systematically improve your program’s performance based on examples and feedback:
  • Optimizers automatically enhance your system by finding better prompts and model parameters. BootstrapFewShot learns which training examples work best and includes them in future prompts. MIPROv2 generates and evaluates multiple prompt variations, selecting the ones that improve accuracy. Instead of manually tweaking prompts, optimizers create a feedback loop where your program learns and improves automatically.

Visual representation of DSPy architecture explaining what DSPy is and how its declarative and learning layers interact.

The optimization loop is where DSPy becomes truly powerful. Unlike traditional prompt engineering, where you manually tweak words and hope for improvement, DSPy’s optimization is systematic and reproducible.

Next, let’s understand the core features that define DSPy’s strength and flexibility.

What are the features of DSPy?

What makes DSPy powerful is its ability to combine declarative definitions with automatic optimization so your system keeps getting better the more it’s used. Let’s explore its most defining features.

  • Declarative design: Define logic through signatures rather than hardcoded prompts. You describe what needs to happen, and DSPy figures out how to make it work.

  • Learnable modules: Each module can automatically optimize its internal prompts based on feedback, reducing manual tuning and improving accuracy over time.

  • Reproducibility: Every optimization step, dataset, and configuration is recorded, allowing you to reproduce experiments and trace system performance with full transparency.

  • Compatibility: Works seamlessly with multiple model providers such as OpenAI, Anthropic, and Mistral, so you can switch LLMs without changing your core code.

  • Compilers and optimizers: DSPy includes intelligent tools like BootstrapFewShot and MIPROv2 that refine system behavior automatically using feedback loops.

  • Self-improvement loop: The framework continuously learns from its results and feedback, leading to gradual, autonomous improvement in performance.

  • Composability: Multiple DSPy modules can be chained together like summarization → reasoning → SQL generation to build complex reasoning systems in a clean, modular way.

With DSPy, we spend less time fine-tuning prompts and more time solving real problems. It’s a step toward making AI development systematic and scalable.

DSPy vs other frameworks

When choosing between DSPy and other LLM frameworks, understanding their core differences helps you select the right tool for your project.

DSPy vs LangChain: LangChain excels at chaining LLM tools and managing complex workflows with multiple components. DSPy focuses on optimizing those workflows through automatic learning and prompt improvement. Think of it this way: LangChain connects the pieces, while DSPy makes each piece work better. If you need to orchestrate multiple LLM calls and tools, LangChain is your framework. If you need those calls to learn and improve from feedback automatically, DSPy is the solution. Many production applications use both frameworks together—LangChain for orchestration and DSPy for optimization.

DSPy vs LlamaIndex: LlamaIndex specializes in data indexing and retrieval for RAG (Retrieval-Augmented Generation) applications, managing document stores efficiently. DSPy handles prompt optimization for any LLM task, including RAG pipelines. While LlamaIndex excels at organizing and retrieving information from your documents, DSPy excels at making your prompts work consistently across different models. For RAG applications, you might use LlamaIndex for document management and DSPy for optimizing the generation prompts.

DSPy vs Traditional Prompt Engineering: Traditional prompt engineering requires manually crafting and testing prompts for each use case. When you switch LLM models or update requirements, you typically start from scratch. DSPy treats prompts as learnable parameters that automatically adapt to new models or requirements. This programmatic DSPy approach makes it ideal for production applications where reliability, reproducibility, and maintainability matter more than quick one-off prototypes.

DSPy shines when you need consistent results across different models, plan to switch LLMs in the future, or want your system to improve from user feedback without constant manual prompt adjustments.

Conclusion

DSPy is an open-source framework from Stanford that simplifies how we build and optimize LLM applications. Instead of relying on manual prompt engineering, DSPy lets you define logic through modular “Signatures” and “Modules,” then automatically refines them using learning algorithms and optimizers. Its declarative design, adaptability, and self-improving feedback loop make it a game-changer for creating scalable, reliable AI systems.

If you’re interested in exploring how LLMs and prompt engineering work under the hood, check out Codecademy’s Learn Prompt Engineering course it’s a great way to connect these ideas to hands-on learning.

Frequently asked questions

1. What does DSPy stand for?

DSPy stands for Declarative Self-improving Python, a framework developed by Stanford to simplify building and optimizing LLM-based systems.

2. Is DSPy used in production?

Yes. While still relatively new, DSPy is already being adopted in research and early-stage production environments to automate prompt tuning and improve system reliability.

3. What programming language is DSPy?

DSPy is built in Python, making it easy to integrate with existing machine learning and NLP pipelines.

4. What is a DSPy module?

A DSPy module defines what an LLM should do i.e. its input, output, and logic, without hardcoding the exact prompt. Examples include Predict, ChainOfThought, and MultiChainComparison.

5. What is the difference between LangChain and DSPy?

LangChain focuses on chaining LLM tools and managing workflows, while DSPy focuses on optimizing those workflows through learning. In short, LangChain connects and DSPy improves.

6. What is the use of DSPy?

DSPy helps developers build smarter, adaptable AI systems that learn from feedback, optimize prompts automatically, and scale across different language models.

Codecademy Team

'The Codecademy Team, composed of experienced educators and tech experts, is dedicated to making tech skills accessible to all. We empower learners worldwide with expert-reviewed content that develops and enhances the technical skills needed to advance and succeed in their careers.'

Meet the full team

Learn more on Codecademy

  • Learn to analyze data with SQL and prepare for technical interviews.
    • Includes 9 Courses
    • With Certificate
    • Beginner Friendly.
      17 hours
  • In this SQL course, you'll learn how to manage large datasets and analyze real data using the standard data management language.
    • Beginner Friendly.
      5 hours
  • Learn about effective prompting techniques to craft high-quality prompts, maximizing your use of generative AI.
    • With Certificate
    • Beginner Friendly.
      1 hour