Building a Language Model Application with LangChain: A Beginners Guide

Codecademy Team
Learn about Large Language Models (LLMs) and how to build applications powered by Generative AI using LangChain.

Introduction to LLMs

With Generative AI sweeping the world, learning to build and develop applications with large language models (LLMs) is crucial. LLMs are machine learning models that can understand and generate human-like language text.

Language models have become central to many applications due to their versatility. They can summarize lengthy documents, translate languages, provide detailed answers to complex questions, and even assist in coding tasks. By leveraging these capabilities, developers can create more intuitive and responsive applications that enhance user experiences.

Now that we understand the significance of language models in modern applications, let’s start learning what LangChain is and how it can simplify building with LLMs.

What is LangChain?

Given the many applications and the increasing reliance on AI technologies, understanding how to build and implement LLM-based applications is becoming an essential skill for developers. A great tool for this is LangChain. LangChain simplifies the process of building applications with LLMs. It provides an easy-to-use interface for integrating various data sources, APIs, and pre-trained language models, allowing developers to create sophisticated AI-driven applications with minimal effort.

There are many advantages to using LangChain. Here are two:

  • Provides pre-built modules and templates, and reduces the complexity of implementing advanced features. This means we can focus more on their projects’ creative and strategic aspects rather than getting bogged down by technical details.
  • With LangChain, we can quickly prototype and iterate on our ideas, reducing the time for development.

In this tutorial, we will practice using LangChain to build an application that summarizes PDFs.

Build a PDF Summarizer with LangChain

To understand how LangChain is used in developing LLM-based applications, let’s build a Gen-AI-powered PDF summary application. First, we begin by setting up our environment.

Set up the Development Environment

To build this application, make sure you have Python installed on your system. If Python is not installed, we will have to install it (see Installing Python 3). We will also need the Streamlit, LangChain, and pypdf modules along with Python. We can install these packages by executing the following commands:

pip install streamlit langchain pypdf

Now we have everything ready to start building.

Build a basic Frontend

The first step in building this application is to build the front end, for this, we will use Streamlit to quickly build a simple interface for our PDF Summarizer.

import streamlit as st
import os
st.set_page_config(page_title="PDF Summarizer")
st.title("PDF Summarizer")
st.write("Summarize your pdf files using hte power of LLMs")
st.divider()
pdf = st.file_uploader("Upload your PDF", type="pdf")
submit = st.button("Generate Summary")

This gives the following output: Frontend for PDF Summarizer built with Streamlit

Now that we have the interface for our application, let’s add functionality to it with LangChain and OpenAI’s ChatGPT 3.5 .

Backend using LangChain

We will begin by importing all the LangChain modules and functions necessary for our project. These include

  • CharacterTextSplitter, HuggingaceEmbeddings, and FAISS, which will be used for split the text from the pdf uploaded into chunks and create it’s knowledge base using embeddings
  • load_qa_chain, openai, ChatOpenAI, get_open_ai_callback and PdfReadermodules; they are used to integrate ChatGPT 3.5 model with the knowledge base generated from the uploaded file to then return its summary.

To import them, the following code is used:

from langchain.text_splitter import CharacterTextSplitter
from langchain.embeddings import HuggingFaceEmbeddings
from langchain import FAISS
from langchain.chains.question_answering import load_qa_chain
from langchain.llms import openai
from langchain_community.chat_models import ChatOpenAI
from langchain.callbacks import get_openai_callback
from pypdf import PdfReader

After importing the required modules, we move on to building the backend functionalities. First, we build the process_text() function that splits the input text into smaller chunks using the CharacterTextSplitter(), ensuring each chunk is around 1000 characters. It then converts these chunks into embeddings using a pre-trained model from HuggingFace (sentence-transformers/all-MiniLM-L6-v2). Finally, it builds a searchable FAISS knowledge base from these embeddings and returns it.

def process_text(text):
text_splitter = CharacterTextSplitter(
separator="\n",
chunk_size=1000,
chunk_overlap=200,
length_function=len
)
chunks = text_splitter.split_text(text)
embeddings = HuggingFaceEmbeddings(model_name='sentence-transformers/all-MiniLM-L6-v2')
knowledgeBase = FAISS.from_texts(chunks, embeddings)
return knowledgeBase

Next, we build the summarizer() function. This function takes the content of the uploaded PDF file and extracts its text. It processes this text into chunks and embeddings using the process_text() function to create a knowledge base. The function then formulates a query to summarize the PDF content, searches for relevant text chunks using similarity search, and uses an OpenAI language model (gpt-3.5-turbo-16k) to generate a concise summary, which is then returned.

def summarizer(pdf):
response = ""
pdf_reader = PdfReader(pdf)
text = ""
# Extract text from each page of the PDF
for page in pdf_reader.pages:
text += page.extract_text() or ""
knowledgeBase = process_text(text)
query = "Summarize the content of the uploaded PDF file in approximately 5-8 sentences."
# Load the question and answer chain
if query:
docs = knowledgeBase.similarity_search(query)
OpenAIModel = "gpt-3.5-turbo-16k"
llm = ChatOpenAI(model=OpenAIModel, temperature=0.1)
chain = load_qa_chain(llm, chain_type='stuff')
#Run the above chain through ChatGPT model to get results
with get_openai_callback() as cost:
response = chain.run(input_documents=docs, question=query)
print(cost)
return response

Finally, we will need to set our OpenAI API key to use the ChatGPT 3.5 model. When the Generate Summary button is clicked, the summarizer() function is called and then the summary is displayed using the code below:

os.environ["OPENAI_API_KEY"] = "YOUR_OPENAI_KEY"
# Call the `summarizer()` function when the `Generate Summary` button is clicked
if summarize:
response = summarizer(pdf)
# Display the returned summary
st.subheader("PDF Summary")
st.write(response)

Run and Deploy the Application

To run this Streamlit-based application we must use the following command:

streamlit run app.py

Note: While running a Streamlit application with document input, you may run into the following error:

AxiosError: Request failed with status code 403

In this case, running the application with the below command solves the issue:

streamlit run app.py --server.enableXsrfProtection false

Furthermore, the application can be deployed using the Streamlit Community Cloud for free. This option is available on the localhost page, in the top right corner. Deploy option highlighted in the top right corner of the localhost page

On clicking Deploy a dialogue box opens up as shown below. Click on Deploy Now under Streamlit Community Cloud and select a Streamlit domain to host any website. Free Streamlit Community Cloud Deployment

By following these steps, we have used LangChain to build an LLM-based PDF Summarizer application and successfully run and deploy it.

Conclusion

We learned the basics of LangChain through the development of PDF Summarizer. We did this by following these steps:

  1. Set up our development environment, ensuring we had all the necessary tools and dependencies.
  2. Implemented basic functionality, creating a simple yet powerful frontend with Streamlit
  3. Developed a backend powered by LangChain to handle PDF text extraction and summarization.
  4. Discussed deploying the application using Streamlit.

By following these steps, you’ve seen how LangChain can streamline the development of applications that harness the capabilities of language models. The PDF summarizer is just the beginning. LangChain’s flexibility and power allow for the creation of various AI-driven applications, from chatbots and virtual assistants to content generation tools and beyond.

If you find this article helpful, do check out Codecademy’s Collection of AI Articles for similar articles.