Getting Started with Using Generative AI for Analytics

Learn how to use Generative AI with Jupyter Notebooks for Analytics.

Introduction

Generative AI for analytics is possible due to the vast amount of AI tools available. Instead of creating custom algorithms or manually parsing through data, we can provide data sets to AI and let it do the hard work for us. In this article, we will combine Jupyter Notebook and OpenAI to analyze real-world data sets.

We will use Jupyter Notebook for creating and sharing computational documents. If you aren’t familiar with Jupyter Notebook, pandas, or Python, try out our course Getting Started with Python for Data Science to learn about these topics.

How to Set Up OpenAI in a Jupyter Notebook

Go through this guide if you don’t have Jupyter Notebook installed

Open a terminal and type `pip3 install jupyter`. This command will install Jupyter Notebook to your local device. Once complete, let’s make sure it is installed. Type `jupyter notebook` in a terminal to begin a session. You should get a webpage with your current directory contents listed.

Image of the Jupyter Notebook front web page

With Jupyter Notebook installed, let’s create a new Terminal session.

Image of the Jupyter Notebook terminal web page

Once the terminal is open, install the OpenAI library using the pip3 install openai command. With the library installed, open a new console, and add a statement to import OpenAI.

Image of the Jupyter Notebook console web page

import openai

We must create an API key to use the OpenAI API. Using the OpenAI API website. Here is a demonstration of how to create an API key.

Note: OpenAI API is not free. It requires a minimum of $5 (USD) to use. If you opt to follow along in this tutorial, the $5 (USD) worth of credits is more than enough.

The secret key will only be accessible one time, at the time of creation. Make sure to save it for future use.

Image of OpenAI Home web page

Image of OpenAI API Key web page

Image of OpenAI Secret Key web page

Set your secret key so you can run OpenAI API function calls.

openai.api_key = '<API_KEY>'

Great work! OpenAI is prepared and ready to use in our Jupyter Notebook. We’ll use the OpenAI API to make requests about creating code for analytics as well as assist in analyzing real-world data.

Jupyter Notebook with OpenAI Example

Let’s use OpenAI with Jupyter Notebook to analyze some real-world data. We’ll begin by creating a helper function that allows us to prompt OpenAI. We’ll then prompt OpenAI to help set up code for OpenAI to analyze a use-case of some college football data.

# import packages
import openai
def get_response(prompt, system = '', temperature = 1.0):
response = openai.ChatCompletion.create(
model = "gpt-3.5-turbo",
messages = [{"role": "user", "content": prompt},
{"role": "system", "content": system}],
temperature = temperature
)
return response['choices'][0]['message']['content']
# set a system prompt
system_prompt = '''You are a helpful AI assistant for data analysis, providing commented code for human review.
You put any code inside a Python code block in Markdown.
You include a bulleted code explanation after the code.'''

The code above imports the OpenAI library to access the API calls and sets our API key to OpenAI so it knows who is making the request. We create a function get_response(prompt, system, temperature) where prompt is the prompt we will request, system is the message that will be sent with each prompt request as a baseline, and temperature is a value that will be used to determine randomness, varying between 0 and 2, where the larger number means more randomness. The function calls on the OpenAI library to generate a response provided the system response and the user prompt. We provide the system prompt each time because, unlike ChatGPT, OpenAI cannot remember your history with each conversation.

With OpenAI prepared, let’s import some college football data and use OpenAI to help us better visualize it. Go to Kaggle. Press on the download button at the top-right of the page to download the data locally. Place that file in your Jupyter Notebook and rename it data.csv. Go back to the console and import the data.

import pandas as pd
football = pd.read_csv('data.csv')

We can do a quick console visualization in table format with the command football.head().

Image of the football data from the command `football.head()` displaying a table visualization of the data

Let’s clean up the data to focus on team rankings, wins, and losses. We’ll modify the data to create a new table that includes Team, Rank, Games, Wins, and Losses, each as its own column.

# Create two new columns "Win" and "Loss"
football[['Win', 'Loss']] = football['Win-Loss'].str.split('-', expand=True)
# Remove the original "Win-Loss" column
football.drop('Win-Loss', axis=1, inplace=True)
# Create new table with the specified_data
new_order = ['Off Rank', 'Team', 'Win', 'Loss']
football = football.reindex(columns=new_order)
# Set Wins and Losses to type `Int` instead of `String`
football["Win"] = football["Win"].apply(int)
football["Loss"] = football["Loss"].apply(int)

Image of the revised football data displaying a table visualization of the new table columns: Off Rank, Team, Win, and Loss

We split football[Wins-Losses] into individual columns and dropped the original Wins-Losses columns. We then created a new table with a specific subset of the original data. Lastly, we modified the Win and Loss columns to be Int instead of String.

Let’s use OpenAI to help us set up code to visualize the data. We’ll prompt OpenAI to create a single bar graph where each team has the wins and losses on the same bar. Additionally, we only want to see teams that have a higher win rate than loss rate.

Prompt (in console):

message_prompt = '''
Generate Seaborn code for producing a single bar graph.
Include matplotlib and seaborn import statements. Pandas has already been imported.
Create a single bar graph for each Team for their Win and Loss columns. I want the Wins and Losses on the same bar. Win should be green and Loss should be red.
Let's only include teams that have a higher win rate than their loss rate
Include code for title and axis labels.
'''
## This code will send the request and display the response
gpt_response = get_response(message_prompt,system_prompt)
print(gpt_response)

Response:

import matplotlib.pyplot as plt
import seaborn as sns
 
# Filter out teams with a higher win rate than loss rate
filtered_df = df[df['Win'] > df['Loss']]
 
# Create a bar graph with seaborn
sns.set(style='whitegrid')
plt.figure(figsize=(10, 5))
sns.barplot(x='Team', y='Win', data=filtered_df, color='green', label='Wins')
sns.barplot(x='Team', y='Loss', data=filtered_df, color='red', label='Losses')
 
# Set title and axis labels
plt.title('Wins and Losses by Team')
plt.xlabel('Team')
plt.ylabel('Count')
 
# Show the legend
plt.legend()
 
# Display the plot
plt.show()

We didn’t clarify that the data is not referenced as df, commonly referred to as dataframe. Make sure to change every df to football.

import matplotlib.pyplot as plt
import seaborn as sns
# Filter out teams with a higher win rate than loss rate
filtered_df = football[football['Win'] > football['Loss']]
# Create a bar graph with seaborn
sns.set(style='whitegrid')
plt.figure(figsize=(10, 5))
sns.barplot(x='Team', y='Win', data=filtered_df, color='green', label='Wins')
sns.barplot(x='Team', y='Loss', data=filtered_df, color='red', label='Losses')
# Set title and axis labels
plt.title('Wins and Losses by Team')
plt.xlabel('Team')
plt.ylabel('Count')
# Show the legend
plt.legend()
# Display the plot
plt.show()

Image of Jupyter Notebook bar graph of Football data

Wow, look at those axes! Let’s change the size of the graph to figsize=(105, 30) to get a more even image.

Image of Jupyter Notebook bar graph of Football data with adjusted axes

Here is the final visualization illustrating all college football teams in 2022 that had a positive win rate.

Conclusion

Great work! We were able to set up OpenAI with Jupyter Notebook. Once set up, we imported some college football data to analyze. We asked OpenAI for assistance in developing code for visualizing the football data. With minor corrections, we were a great graph showcasing the teams with positive win rates!

To see what else you can do with ChatGPT (or generative AI in general), checkout some of the articles located here: AI Articles.

Author

Codecademy Team

'The Codecademy Team, composed of experienced educators and tech experts, is dedicated to making tech skills accessible to all. We empower learners worldwide with expert-reviewed content that develops and enhances the technical skills needed to advance and succeed in their careers.'

Meet the full team