Articles

Build AI agents with LangChain v1: step-by-step tutorial

LangChain is an open-source framework for building applications with large language models. It provides tools to connect LLMs with external data sources, APIs, and custom functions, enabling developers to create agents that can reason and take actions.

LangChain v1 simplifies agent development by replacing multiple agent types with a single create_agent function. This tutorial demonstrates how to build a research assistant that searches the web, validates sources, and generates structured reports using Google Gemini and LangChain v1.

  • Learn to build autonomous AI agents that use tools, make decisions, and accomplish complex tasks using LangChain and agentic design patterns.
    • Includes 6 Courses
    • With Certificate
    • Intermediate.
      6 hours
  • AI Engineers build complex systems using foundation models, LLMs, and AI agents. You will learn how to design, build, and deploy AI systems.
    • Includes 16 Courses
    • With Certificate
    • Intermediate.
      20 hours

What is LangChain v1?

LangChain v1 provides create_agent function as the standard way to build agents. This function creates agents that follow the ReAct (Reasoning +Acting) pattern: the model reasons, calls tools when needed, and continues until completing the task.

Three major improvements define v1:

  1. Content blocks standardize responses across providers. Whether using OpenAI, Anthropic, or Google Gemini, messages return in identical formats. This enables switching providers without rewriting parsing logic.
  2. Middleware adds customization hooks at specific points in the agent loop. Insert logic before model calls, modify requests, or process responses after execution.
  3. Structured outputs validate responses automatically. Pass a Pydantic model to create_agent, and the framework returns typed data with automatic retries on validation failures.

Legacy chains and older agent types moved to langchain-classic. The core v1 package contains only essential building blocks for agent development.

Building agents with v1 requires installing the framework and configuring access to a language model.

Setting up your LangChain v1 development environment

Building AI agents with LangChain v1 requires installing LangChain version 1 for agent orchestration, Google Gemini as the language model, Streamlit for the web interface, and python-dotenv to manage environment variables.

Start by installing these dependencies with the following command:

pip install langchain langchain-google-genai streamlit python-dotenv

Now that the packages are installed, the next step is getting access to Google Gemini. Navigate to AI Studio and click “Create API Key” to generate a new key. The free tier includes enough quota for this tutorial.

With the API key generated, create a new file named .env in the project directory and add the key:

GOOGLE_API_KEY=your_gemini_api_key_here

Verify the installation works by importing the key components:

from langchain.agents import create_agent
from langchain_google_genai import ChatGoogleGenerativeAI

If the imports succeed without errors, the environment is configured correctly. With the setup complete, the next step is building the research assistant that demonstrates LangChain v1’s core features.

How to build a content research assistant with LangChain v1

This section builds a research assistant that searches the web, validates response quality, and returns structured reports. The implementation demonstrates structured outputs, middleware, custom tools, and Gemini grounding working together.

The application will live in a single file called app.py, built progressively across the following steps. Each component adds specific functionality to create the complete research assistant.

Gemini models include native grounding through Google Search. Unlike custom tools that require explicit function definitions, grounding activates automatically when the model needs current information.

Start by creating app.py and adding the necessary imports:

import streamlit as st
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain.agents import create_agent
from langchain.agents.middleware import AgentMiddleware
from langchain_core.tools import tool
from pydantic import BaseModel, Field
from typing import List
import os
from dotenv import load_dotenv
load_dotenv()

These imports provide access to Streamlit for the interface, LangChain components for the agent, and environment variables for the API key.

With the imports ready, configure the model to enable Google Search grounding:

# Initialize model with Google Search grounding
model = ChatGoogleGenerativeAI(
model="gemini-2.0-flash-exp",
google_api_key=os.getenv("GOOGLE_API_KEY"),
temperature=0.7
)
model_with_search = model.bind_tools([{"google_search": {}}])

The bind_tools method enables grounding. When the model receives a query requiring current information, it triggers Google Search internally and incorporates results into the response without explicit tool calls appearing in the conversation.

This model configuration will go inside the agent initialization section later. For now, understanding how grounding works sets the foundation for building custom capabilities on top of it.

Step 2: Create custom tools with the @tool decorator

Custom tools extend the agent’s capabilities beyond grounding. For this research assistant, one tool formats source URLs into readable lists.

Add the tool definition after the imports:

# Custom tool
@tool
def format_sources(urls: List[str]) -> str:
"""Format source URLs into a readable list.
Args:
urls: List of source URLs
"""
if not urls:
return "No sources provided."
formatted = "\n\n**Sources:**\n"
for i, url in enumerate(urls, 1):
formatted += f"{i}. {url}\n"
return formatted

The @tool decorator converts the function into a LangChain-compatible tool. The docstring provides the model with information about what the tool does and when to use it. Type hints ensure the model passes data in the correct format.

This tool will structure source lists consistently, making research reports easier to read. With custom tools defined, the next step configures the output structure.

Step 3: Configure the research agent with system prompts

Structured outputs guarantee responses follow a specific schema, eliminating the need for parsing code. Add the Pydantic model after the tool definition:

# Structured output model
class ResearchReport(BaseModel):
"""Structured research report."""
summary: str = Field(description="Brief summary of research findings")
key_findings: List[str] = Field(description="3-5 main findings as bullet points")
sources: List[str] = Field(description="URLs of sources cited")

This model defines three required fields with type annotations and descriptions. The descriptions help the model understand what content belongs in each field.

When the agent completes a task with this schema, LangChain validates the response automatically. Invalid outputs trigger automatic retries until the model produces valid data.

Having defined the output structure, the next step ensures responses meet quality standards before reaching users.

Step 4: Add middleware for automatic quality validation

Middleware intercepts the agent’s execution to enforce quality standards. Add this validation class after the Pydantic model:

# Middleware for validation
class ValidationMiddleware(AgentMiddleware):
"""Ensure minimum quality standards."""
def after_model(self, state, runtime):
"""Check if response meets quality standards."""
if "structured_response" in state:
report = state["structured_response"]
if len(report.key_findings) < 3:
return {
"messages": [{
"role": "user",
"content": "Please provide at least 3 key findings."
}]
}
if len(report.sources) < 2:
return {
"messages": [{
"role": "user",
"content": "Please cite at least 2 authoritative sources."
}]
}
return None

The after_model method runs after each model response. When validation fails, returning a dictionary with a new message prompts the model to try again with the feedback. Returning None allows execution to continue normally.

This middleware enforces quality without modifying the agent’s core logic. The agent automatically retries when responses don’t meet standards, creating a feedback loop that improves output quality.

With all supporting components in place, assembling them into a working agent comes next.

Step 5: Configure the agent with system prompts

The agent combines the grounded model, custom tools, middleware, and structured outputs. Before wiring everything together, define the system prompt that sets research expectations:

# System prompt
system_prompt = """You are a research assistant. Research the topic thoroughly and provide structured findings.
Use Google Search to find current information from authoritative sources.
Your response must include:
- A summary (2-3 sentences)
- Key findings (3-5 bullet points with specific facts)
- Source URLs (at least 2 authoritative sources)"""

The system prompt establishes clear requirements for response format and content quality. Explicit expectations help the model produce consistent outputs.

Now configure the Streamlit page and initialize the agent. This code goes after all the component definitions:

# Page config
st.set_page_config(page_title="LangChain v1 Research Agent", page_icon="🔍", layout="wide")
st.title("LangChain v1 Research Assistant")
st.markdown("Demonstrating **middleware**, **structured outputs**, and **custom tools**")
# Initialize session state
if "messages" not in st.session_state:
st.session_state.messages = []
if "agent" not in st.session_state:
# Initialize model with grounding
model = ChatGoogleGenerativeAI(
model="gemini-2.0-flash-exp",
google_api_key=os.getenv("GOOGLE_API_KEY"),
temperature=0.7
)
model_with_search = model.bind_tools([{"google_search": {}}])
# Create agent with all v1 features
st.session_state.agent = create_agent(
model=model_with_search,
tools=[format_sources],
system_prompt=system_prompt,
middleware=[ValidationMiddleware()],
response_format=ResearchReport
)

This initialization happens once per Streamlit session. The agent combines grounding for web search, custom tools for formatting, middleware for quality validation, and structured outputs for consistent responses.

The response_format parameter connects the Pydantic model to the agent. When research completes, the response appears validated under structured_response.

With the agent configured, the final step builds the interface that users interact with.

Step 6: Build the Streamlit chat interface

The chat interface allows users to enter research queries and view formatted results. Add the message display and input handling:

# Display chat history
for message in st.session_state.messages:
with st.chat_message(message["role"]):
st.markdown(message["content"])
# Chat input
if prompt := st.chat_input("What would you like me to research?"):
# Add user message
st.session_state.messages.append({"role": "user", "content": prompt})
with st.chat_message("user"):
st.markdown(prompt)
# Get agent response
with st.chat_message("assistant"):
message_placeholder = st.empty()
with st.spinner("Researching with LangChain v1..."):
result = st.session_state.agent.invoke({
"messages": [{"role": "user", "content": prompt}]
})
# Format structured response
if "structured_response" in result:
report = result["structured_response"]
response = f"**Summary:**\n{report.summary}\n\n"
response += "**Key Findings:**\n"
for i, finding in enumerate(report.key_findings, 1):
response += f"{i}. {finding}\n"
response += "\n**Sources:**\n"
for i, source in enumerate(report.sources, 1):
response += f"{i}. {source}\n"
else:
response = result["messages"][-1].content
message_placeholder.markdown(response)
st.session_state.messages.append({"role": "assistant", "content": response})

This code displays the conversation history, captures user input, shows a loading spinner during research, and formats the structured response into readable sections.

Complete the interface by adding a sidebar with options and feature documentation:

# Sidebar
with st.sidebar:
st.header("Options")
if st.button("Clear Chat History"):
st.session_state.messages = []
st.rerun()
if st.session_state.messages:
chat_md = "# Research Chat History\n\n"
for msg in st.session_state.messages:
role = "**User**" if msg["role"] == "user" else "**Assistant**"
chat_md += f"{role}:\n{msg['content']}\n\n---\n\n"
st.download_button(
label="Download Chat",
data=chat_md,
file_name="research_chat.md",
mime="text/markdown"
)
st.markdown("---")
st.markdown("### LangChain v1 Features")
st.markdown("""
**Structured Outputs**
- Pydantic model validation
- Automatic retry on errors
**Middleware**
- Quality validation (min 3 findings, 2 sources)
- Automatic feedback loop
**Custom Tools**
- Source formatting
**Gemini Grounding**
- Real-time web search
""")

The sidebar provides a clear button to reset conversations, a download option for saving research sessions, and a feature reference showing which LangChain v1 capabilities the application demonstrates.

The implementation is complete. The next step tests the research assistant to verify all features work correctly.

Step 7: Test the research assistant

With app.py complete, launch the application from the terminal:

streamlit run app.py

The browser opens automatically at http://localhost:8501. The interface displays the title, feature description, and an empty chat waiting for the first query.

LangChain v1 research assistant interface showing the main chat area with input box and sidebar displaying LangChain v1 features including structured outputs, middleware, custom tools, and Gemini grounding

The interface includes three main sections: the chat area in the center, the input box at the bottom, and the sidebar on the right showing which LangChain v1 features are active.

Enter a research query to test the agent. Try “What are the latest quantum computing breakthroughs?” or “Compare React and Vue for enterprise applications” to see the full workflow in action.

Animated demonstration of entering a quantum computing research query, showing the loading spinner during Google Search, then the structured response appearing with summary section, numbered key findings, and source URLs

The agent processes queries through these stages:

  • Research phase: The loading spinner appears with “Researching with LangChain v1…” while Gemini searches Google for current information
  • Validation phase: Middleware checks that the response has at least 3 key findings and 2 sources
  • Display phase: The structured response appears with clearly formatted sections

Each response follows the same structure:

  • Summary: 2-3 sentences synthesizing the main findings
  • Key Findings: Numbered list of 3-5 specific facts with details
  • Sources: Numbered list of authoritative URLs where information was found

The middleware enforcement becomes visible when responses initially lack sufficient detail. The agent automatically retries until meeting quality standards, though this happens behind the scenes without user intervention.

Conclusion

LangChain v1 simplifies agent development through unified patterns and production-ready features:

  • Structured outputs eliminate parsing complexity through Pydantic validation
  • Middleware enables quality control, logging, and approval workflows without modifying core agent code
  • Gemini grounding provides web search capabilities through a single configuration line
  • The create_agent function replaces legacy chains with a standardized approach

Building with large language models requires understanding both the capabilities and constraints of these systems. Codecademy’s free Introduction to Large Language Models course covers LLM fundamentals, prompt engineering, and practical applications. This foundational knowledge complements the agent-building skills covered in this tutorial.

Frequently asked questions

1. What is LangChain v1?

LangChain v1 consolidates agent patterns around the create_agent function, built on LangGraph’s runtime. It provides standardized content blocks, middleware hooks, and integrated structured outputs while moving legacy functionality to langchain-classic.

2. When was LangChain v1 released?

LangChain v1 reached general availability in November 2024 after an alpha period beginning in September 2024.

3. What is LangChain used for?

LangChain enables building LLM-powered applications including conversational agents, retrieval-augmented generation systems, and autonomous workflows. The framework standardizes interactions across model providers.

4. Is LangChain free to use?

LangChain is open source under the MIT license with no usage fees. Model providers (OpenAI, Anthropic, Google) require separate API keys with their own pricing.

5. Is LangChain chain deprecated?

Legacy chain abstractions moved to langchain-classic but remain supported. The v1 architecture favors create_agent for more flexible control through middleware. Migration is optional.

Codecademy Team

'The Codecademy Team, composed of experienced educators and tech experts, is dedicated to making tech skills accessible to all. We empower learners worldwide with expert-reviewed content that develops and enhances the technical skills needed to advance and succeed in their careers.'

Meet the full team

Learn more on Codecademy

  • Learn to build autonomous AI agents that use tools, make decisions, and accomplish complex tasks using LangChain and agentic design patterns.
    • Includes 6 Courses
    • With Certificate
    • Intermediate.
      6 hours
  • AI Engineers build complex systems using foundation models, LLMs, and AI agents. You will learn how to design, build, and deploy AI systems.
    • Includes 16 Courses
    • With Certificate
    • Intermediate.
      20 hours
  • Learn how to plan and conduct user research, analyze user data, and share research insights by creating a research report.
    • Beginner Friendly.
      1 hour