How to Use ChatGPT Advanced Data Analysis

Codecademy Team
Get started with ChatGPT's Advanced Data Analysis

Introduction

ChatGPT Advanced Data Analysis (formerly ChatGPT Code Interpreter) was a plug-in that has since been rolled into OpenAI’s GPT-4 implementation. It is a feature that fixed some of ChatGPT’s shortcomings. Before Advanced Data Analysis, the basic ChatGPT model had some limitations:

  • It couldn’t run code, so the code it produced could have errors or bugs.
  • It didn’t allow for the uploading of data.
  • It could not produce charts or graphs.
  • It was also bad at mathematical questions.

Advanced Data Analysis addressed all these issues. It can run Python code in its sandbox. It allows us to upload data to ChatGPT so it can generate insights. It can also create visualizations based on that data. And finally, it can interpret complex math problems for us. In the following sections, we will look at all these features, and discuss some of their limitations.

Accessing Advanced Data Analysis

To access Advanced Data Analysis, we need a ChatGPT Plus subscription, which currently costs $20 USD a month. You can get one by going to chat.openai.com and clicking the “sign up” button and following the prompts to create a ChatGPT account. If you already have a free ChatGPT account, you can select “Upgrade” from the lower left of the home screen to open a dialog to sign up for a ChatGPT Plus account. A “Plus” subscription gives you access to GPT-4, which now has Advanced Data Analysis built in. (Advanced Data Analysis used to be a separate plugin you needed to enable.) To access GPT-4 and Advanced Data Analysis, select the GPT-4 model in the upper right of the home screen.

Selecting GPT-4 on the home screen

You’ll notice that it is enabled because a paperclip icon will appear in the chat box.

Paperclip icon next to the chat box

This is how we’ll upload files later.

Generating Code and Using Mathematics

The original name for Advanced Data Analysis was “Code Interpreter”. As this implies, it is a powerful code generation tool. The original ChatGPT model could generate code, but it wasn’t trustworthy. The code it generated might have bugs, syntax errors, or calls to non-existent functions. With Advanced Data Analysis, ChatGPT can run the code it generates in its own sandbox and debug it itself. It is limited in that it currently can only work with Python code, but it is still a major step forward.

Getting ChatGPT to generate code is as simple as asking it to write a function for you.

Prompting ChatGPT to create a Fibonacci function

To see the code of the function, click on the “view analysis” link at the end of the response. (Highlighted above.) This will open a window showing the code ChatGPT generated.

Viewing the code of the fibonacci() function

At the top of the window, highlighted above, you can see a link to copy the code that was generated. The following block lists the code this prompt generated:

def fibonacci(n):
"""
Generate the first n numbers in the Fibonacci sequence.
Args:
n (int): The number of terms in the Fibonacci sequence to generate.
Returns:
list: A list containing the first n numbers of the Fibonacci sequence.
Raises:
ValueError: If n is not a positive integer.
"""
if not isinstance(n, int) or n <= 0:
raise ValueError("Input must be a positive integer.")
sequence = []
a, b = 0, 1
for _ in range(n):
sequence.append(a)
a, b = b, a + b
return sequence
# Example usage of the function
try:
print(fibonacci(10)) # Should print the first 10 Fibonacci numbers
print(fibonacci(-1)) # Should raise a ValueError
except ValueError as e:
print(e)

Scrolling to the bottom of the window, you’ll see the output for the code it ran.

Viewing the output of the fibonacci() function

You can see it ran a couple of basic tests for it. But if you’re not satisfied with that, you can ask it to run further tests on the function it generated.

Prompting ChatGPT to run unit tests on the fibonacci() function

Again, you can click on the “view analysis” link at the end of the response to view the code that ran the tests.

Advanced Data Analysis can also use Python code to solve mathematical problems. Here’s an example of handing it a text problem.

Prompting ChatGPT to solve a mathematical text problem

Here is ChatGPT’s response:

ChatGPT solving a mathematical text problem

You can see the work it did to get this answer by clicking on the “view analysis” link at the end of the response. The code for the above looks like this:

# Calculation
combined_speed = 50 + 60 # km/h
remaining_distance = 75 # km
# Time for the cars to meet
time_to_meet = remaining_distance / combined_speed # hours
# Since Car 2 starts at 6:30 AM, we add this time to find the meeting time
meeting_time_hour = 6.5 + time_to_meet # 6:30 AM is 6.5 in 24-hour format
# Distance Car 1 travels until they meet
distance_from_city_A = 50 * (0.5 + time_to_meet) # Car 1's speed * total time it travels
time_to_meet, meeting_time_hour, distance_from_city_A

Analyzing Data

As its name suggests, Advanced Data Analysis is a powerful tool to gain insights into data. We can upload various file types to ChatGPT, from images to PDFs, but Advanced Data Analysis is designed to primarily use .txt and .csv files. While this is a powerful tool, you should be aware of some caveats before you start uploading data:

  • ChatGPT doesn’t know where the data comes from or what assumptions may have been used to produce it. Therefore, generated insights may be flawed for subtle reasons that require subject matter expertise to understand.
  • ChatGPT hides the steps of the analysis (though you can view them) so it becomes easier to miss key mistakes. Checking ChatGPT’s analysis thoroughly can be a lengthy and involved process.
  • Uploading data to ChatGPT comes with serious privacy concerns.

You can mitigate these issues by doing the following:

  • Ensure column names accurately reflect the data.
  • Know the data thoroughly yourself; how it was collected, how it was cleaned, how it is still messy.
  • Provide any key context within the prompt.
  • Analyze ChatGPT’s code and output like an adversary, searching to find what’s wrong with it.
  • Never upload closed data, especially data with PII.

To find a dataset to use to demonstrate Advanced Data Analysis we’ll go to Kaggle, and download a dataset of Google Play Store Apps. (If you don’t have a Kaggle account, you’ll have to register. It’s free.) Once you download the archive.zip file, unzip it into three files. The one we’ll be using is googleplaystore.csv. Once you have this file locally, you can upload it to ChatGPT using the paperclip icon in the chat box we saw earlier.

Paperclip icon next to the chat box

Loading the file we can prompt for insights into the data:

Prompting for insights into uploaded data

By having ChatGPT describe the dataset, you can confirm it “understands” what the data means, as a check on the insights it provides. The insights provided for this dataset look like this:

ChatGPT's insights into uploaded data

As before, you can click on the “view analysis” link at the end of the response to see the Python code driving these insights so you can check on ChatGPT’s work.

You can also ask ChatGPT to help clean your data. If we look at another dataset from NASA about meteorite landings, we can ask ChatGPT the following:

Prompting ChatGPT for suggestions about cleaning uploaded data

As far as data cleaning is concerned, this is ChatGPT’s response:

ChatGPT's suggestions about cleaning uploaded data

In addition to asking it to spot potential issues for you, you can ask ChatGPT to clean the data for you.

ChatGPT being prompted to clean uploaded data

By clicking on the “view analysis” link you can see the Python code ChatGPT generated to clean our dataset. This is the code ChatGPT generated to clean the Meteorite Landings dataset:

from datetime import datetime
# Current year
current_year = datetime.now().year
# Removing rows based on specified conditions
cleaned_meteorite_data = meteorite_data.copy()
# Removing rows where year is prior to 2000, blank, or after the current year
cleaned_meteorite_data = cleaned_meteorite_data[
cleaned_meteorite_data['year'].between(2000, current_year, inclusive='both')
]
# Removing rows where Longitude and Latitude are blank, both zero, or duplicate other entries
cleaned_meteorite_data = cleaned_meteorite_data.dropna(subset=['reclat', 'reclong'])
cleaned_meteorite_data = cleaned_meteorite_data[~((cleaned_meteorite_data['reclat'] == 0) & (cleaned_meteorite_data['reclong'] == 0))]
cleaned_meteorite_data = cleaned_meteorite_data.drop_duplicates(subset=['reclat', 'reclong'])
# Removing rows where mass is 0 or missing
cleaned_meteorite_data = cleaned_meteorite_data[cleaned_meteorite_data['mass (g)'] > 0]
# Renaming the dataset
cleaned_meteorite_data.name = "Cleaned Meteorite Landings"
cleaned_meteorite_data.head(), cleaned_meteorite_data.shape

You can even ask ChatGPT to provide a downloadable link of the cleaned dataset.

ChatGPT providing a link to cleaned data

Note: When using the Advanced Data Analysis feature, you may occasionally run into an “Error Analyzing” message. (Clicking on the message will show the code it was trying to run.) This can be due to messy data such as inconsistent data types in columns. But if this happens repeatedly without an obvious data error, try moving your analysis to a fresh chat. If that doesn’t solve the problem, it is possibly the load on the ChatGPT servers, and you should wait a little while and try again.

Visualizing Data

Because ChatGPT uses Python, it can use Python’s tools to create visualizations of the data. You just need to prompt it to do so. Let’s go back to the googleplaystore.csv dataset and ask for one.

Prompting ChatGPT for a visualization of uploaded data

ChatGPT gave a visualization based on the prompt, and even did some data cleanup. This illustrates that in real-world applications it is best practice to spend time thoroughly cleaning your data (perhaps with ChatGPT’s help) before asking for analysis, but here is a nice illustration of how ChatGPT will try and accommodate you. It’s also a warning that it will take it upon itself to try and “understand” the data, so you need to keep an eye on what it is doing by looking at the “view analysis” button.

You can specify the type of visualization in the prompt as well:

Prompting ChatGPT for a visualization of uploaded data, specifying the visualization type

If you click on “view analysis” for this response, it will show you the specific code ChatGPT used to create the graph. In this case it looks like this:

# Filtering data for the 'Game' category
game_category_data = data_clean[data_clean['Category'] == 'GAME']
# Converting 'Reviews' to numeric
game_category_data['Reviews'] = pd.to_numeric(game_category_data['Reviews'], errors='coerce')
# Scatter plot
plt.figure(figsize=(12, 8))
sns.scatterplot(data=game_category_data, x='Reviews', y='Installs', alpha=0.6)
plt.title('Number of Installs vs Number of Reviews for Games Category')
plt.xlabel('Number of Reviews')
plt.ylabel('Number of Installs')
plt.xscale('log') # Using logarithmic scale for better visibility
plt.yscale('log')
plt.grid(True)
plt.show()

This is particularly useful for generating code for complex visualizations that we can then edit, tweaking the settings for our particular needs.

It’s also possible to ask ChatGPT to visualize its insights into the data.

ChatGPT visualizing its own insights into uploaded data

For this example, we asked ChatGPT to clean the data for us, and we can see, like before, that it had a little trouble initially before coming up with the visualization. Even so, it provided us with an insight into ratings distributions in the Google Play Store.

Conclusion

In this tutorial, we covered the basics of using Advanced Data Analysis in ChatGPT. We covered using it for code generation and using it for solving mathematical problems. We showed you how to upload data and get ChatGPT to analyze it for us. We demonstrated gaining insights, as well as using ChatGPT to clean data for us. Finally, we illustrated the powerful visualization capabilities available through Advanced Data Analysis. This should give you the foundation you need to explore ChatGPT’s powerful data analysis tools.

If you want to learn more about ChatGPT (or generative AI in general), check out these articles: