Building a Language Model Application with LangChain: A Beginners Guide
Introduction to LLMs
With Generative AI sweeping the world, learning to build and develop applications with large language models (LLMs) is crucial. LLMs are machine learning models that can understand and generate human-like language text.
Language models have become central to many applications due to their versatility. They can summarize lengthy documents, translate languages, provide detailed answers to complex questions, and even assist in coding tasks. By leveraging these capabilities, developers can create more intuitive and responsive applications that enhance user experiences.
Now that we understand the significance of language models in modern applications, let’s start learning what LangChain is and how it can simplify building with LLMs.
What is LangChain?
Given the many applications and the increasing reliance on AI technologies, understanding how to build and implement LLM-based applications is becoming an essential skill for developers. A great tool for this is LangChain. LangChain simplifies the process of building applications with LLMs. It provides an easy-to-use interface for integrating various data sources, APIs, and pre-trained language models, allowing developers to create sophisticated AI-driven applications with minimal effort.
There are many advantages to using LangChain. Here are two:
- Provides pre-built modules and templates, and reduces the complexity of implementing advanced features. This means we can focus more on their projects’ creative and strategic aspects rather than getting bogged down by technical details.
- With LangChain, we can quickly prototype and iterate on our ideas, reducing the time for development.
In this tutorial, we will practice using LangChain to build an application that summarizes PDFs.
Build a PDF Summarizer with LangChain
To understand how LangChain is used in developing LLM-based applications, let’s build a Gen-AI-powered PDF summary application. First, we begin by setting up our environment.
Set up the Development Environment
To build this application, make sure you have Python installed on your system. If Python is not installed, we will have to install it (see Installing Python 3). We will also need the Streamlit, LangChain, and pypdf modules along with Python. We can install these packages by executing the following commands:
pip install streamlit langchain pypdf
Now we have everything ready to start building.
Build a basic Frontend
The first step in building this application is to build the front end, for this, we will use Streamlit to quickly build a simple interface for our PDF Summarizer.
import streamlit as stimport osst.set_page_config(page_title="PDF Summarizer")st.title("PDF Summarizer")st.write("Summarize your pdf files using hte power of LLMs")st.divider()pdf = st.file_uploader("Upload your PDF", type="pdf")submit = st.button("Generate Summary")
This gives the following output:
Now that we have the interface for our application, let’s add functionality to it with LangChain and OpenAI’s ChatGPT 3.5 .
Backend using LangChain
We will begin by importing all the LangChain modules and functions necessary for our project. These include
CharacterTextSplitter
,HuggingaceEmbeddings
, andFAISS
, which will be used for split the text from the pdf uploaded into chunks and create it’s knowledge base using embeddingsload_qa_chain
,openai
,ChatOpenAI
,get_open_ai_callback
andPdfReader
modules; they are used to integrate ChatGPT 3.5 model with the knowledge base generated from the uploaded file to then return its summary.
To import them, the following code is used:
from langchain.text_splitter import CharacterTextSplitterfrom langchain.embeddings import HuggingFaceEmbeddingsfrom langchain import FAISSfrom langchain.chains.question_answering import load_qa_chainfrom langchain.llms import openaifrom langchain_community.chat_models import ChatOpenAIfrom langchain.callbacks import get_openai_callbackfrom pypdf import PdfReader
After importing the required modules, we move on to building the backend functionalities. First, we build the process_text()
function that splits the input text into smaller chunks using the CharacterTextSplitter()
, ensuring each chunk is around 1000 characters. It then converts these chunks into embeddings using a pre-trained model from HuggingFace (sentence-transformers/all-MiniLM-L6-v2). Finally, it builds a searchable FAISS knowledge base from these embeddings and returns it.
def process_text(text):text_splitter = CharacterTextSplitter(separator="\n",chunk_size=1000,chunk_overlap=200,length_function=len)chunks = text_splitter.split_text(text)embeddings = HuggingFaceEmbeddings(model_name='sentence-transformers/all-MiniLM-L6-v2')knowledgeBase = FAISS.from_texts(chunks, embeddings)return knowledgeBase
Next, we build the summarizer()
function. This function takes the content of the uploaded PDF file and extracts its text. It processes this text into chunks and embeddings using the process_text()
function to create a knowledge base. The function then formulates a query to summarize the PDF content, searches for relevant text chunks using similarity search, and uses an OpenAI language model (gpt-3.5-turbo-16k) to generate a concise summary, which is then returned.
def summarizer(pdf):response = ""pdf_reader = PdfReader(pdf)text = ""# Extract text from each page of the PDFfor page in pdf_reader.pages:text += page.extract_text() or ""knowledgeBase = process_text(text)query = "Summarize the content of the uploaded PDF file in approximately 5-8 sentences."# Load the question and answer chainif query:docs = knowledgeBase.similarity_search(query)OpenAIModel = "gpt-3.5-turbo-16k"llm = ChatOpenAI(model=OpenAIModel, temperature=0.1)chain = load_qa_chain(llm, chain_type='stuff')#Run the above chain through ChatGPT model to get resultswith get_openai_callback() as cost:response = chain.run(input_documents=docs, question=query)print(cost)return response
Finally, we will need to set our OpenAI API key to use the ChatGPT 3.5 model. When the Generate Summary button is clicked, the summarizer()
function is called and then the summary is displayed using the code below:
os.environ["OPENAI_API_KEY"] = "YOUR_OPENAI_KEY"# Call the `summarizer()` function when the `Generate Summary` button is clickedif summarize:response = summarizer(pdf)# Display the returned summaryst.subheader("PDF Summary")st.write(response)
Run and Deploy the Application
To run this Streamlit-based application we must use the following command:
streamlit run app.py
Note: While running a Streamlit application with document input, you may run into the following error:
AxiosError: Request failed with status code 403In this case, running the application with the below command solves the issue:
streamlit run app.py --server.enableXsrfProtection false
Furthermore, the application can be deployed using the Streamlit Community Cloud for free. This option is available on the localhost page, in the top right corner.
On clicking Deploy
a dialogue box opens up as shown below. Click on Deploy Now
under Streamlit Community Cloud and select a Streamlit domain to host any website.
By following these steps, we have used LangChain to build an LLM-based PDF Summarizer application and successfully run and deploy it.
Conclusion
We learned the basics of LangChain through the development of PDF Summarizer. We did this by following these steps:
- Set up our development environment, ensuring we had all the necessary tools and dependencies.
- Implemented basic functionality, creating a simple yet powerful frontend with Streamlit
- Developed a backend powered by LangChain to handle PDF text extraction and summarization.
- Discussed deploying the application using Streamlit.
By following these steps, you’ve seen how LangChain can streamline the development of applications that harness the capabilities of language models. The PDF summarizer is just the beginning. LangChain’s flexibility and power allow for the creation of various AI-driven applications, from chatbots and virtual assistants to content generation tools and beyond.
If you find this article helpful, do check out Codecademy’s Collection of AI Articles for similar articles.
Author
The Codecademy Team, composed of experienced educators and tech experts, is dedicated to making tech skills accessible to all. We empower learners worldwide with expert-reviewed content that develops and enhances the technical skills needed to advance and succeed in their careers.
Meet the full team