Codecademy Logo

Prompt Engineering for Analytics

Prompt Engineering for Large Language Models

Prompts are what we send to a large language model, such as a request for the model to produce analytics code.

Prompt Engineering is the art and science of designing prompts to produce the most effective responses from AI large language models.

Application Programming Interface

An Application Programming Interface, or API, is a tool that makes it easier for developers to access software from another application.

API Keys

Some third-party APIs require an API key which is a special token that is given to a developer to interact with the API. These API keys are unique and should be kept secret.

Selecting Large Language Models Using the OpenAI API

OpenAI provides several different large language models that can be selected with their API using the model parameter in the openai.ChatCompletion() method.

As of 2023, OpenAI’s newest GPT models include gpt-4 and gpt-3.5-turbo. An updated list of their GPT models and their API pricing can be found here.

Prompting Large Language Models Using the OpenAI API

The OpenAI API can be used to prompt large language models and adjust parameters controlling the model behavior.

Three common pieces of information to send using the API are:

  1. a system prompt to set the context and behavior of the AI model.
  2. a message prompt that asks the AI model to generate a specific response.
  3. a temperature setting to control the randomness of the response from GPT. Lower values (closer to 0) generate more consistent responses tied more closely to the training data, while higher values (up to 2) tend to be more diverse and seem more creative.

OpenAI API Helper Function

The get_response() helper function is used to send a prompt with the following settings to OpenAI’s GPT language model:

  1. A system prompt
  2. A user prompt
  3. An optional temperature value

The language model’s generated response in response['choices'][0]['message']['content'] is returned.

def get_response(prompt, system = '', temperature = 1.0):
response = openai.ChatCompletion.create(
model = "gpt-3.5-turbo",
messages = [{"role": "user", "content": prompt},
{"role": "system", "content": system}],
temperature = temperature
return response['choices'][0]['message']['content']

Display GPT Responses Using Markdown

GPT’s responses from the OpenAI API can be formatted into Markdown using two special Jupyter Notebook functions: display and Markdown. These display Python code in a clean format by providing syntax highlighting, indentations, and easy copy + pasting.

from IPython.display import display, Markdown

Using the OpenAI API for Analytics

Data analysts can prompt OpenAI’s large language models to

  1. produce code for statistical analysis, ML, and visualizations
  2. debugging code
  3. brainstorming analytics ideas
  4. and much more!

Anatomy of an Effective Prompt

When prompting OpenAI’s large language models to produce code for analytics, we can use the following principles to engineer effect prompts:

  1. Set the overall goal of the prompt
  2. Establish the current code state (programming language and libraries)
  3. Describe the dataset and relevant columns
  4. Outline what the code should achieve in the final output
  5. Provide important details

Debug Coding Errors Using the OpenAI API

Using OpenAI’s large language models to identify and debug errors in your code involves setting a behavior-specific system prompt and a message prompt that

  1. Outlines our goal of debugging the error.
  2. Describes the dataset and other relevant information.
  3. Provides the code and the error message.
system_prompt = '''You are a helpful AI assistant for debugging Python data visualizations.
Given a code snippet that generates a visualization, you will:
1. Identify and fix any errors.
2. Provide bulleted code explanations of the changes made.
3. Suggest improvements to enhance the effectiveness of the visualization.'''

Using GPT to Brainstorm Ideas for Analytics

When using GPT to brainstorm ideas for analytics, it may be useful to change the following settings in our prompt setup:

  • system prompt: set a behavior we want to see from a brainstorming assistant for data analysis.
  • temperature: consider increasing the temperature to receive less deterministic responses.
  • iteration: send the same prompt more than once to obtain a wide variety of ideas.

AI Hallucinations in Analytics

GPT (and other AI language models) use probability to respond to prompts. They don’t actually know or understand what they are saying, and so it is common for them to “hallucinate”, or generate false information.

For example, GPT could make up a Python library that doesn’t actually exist.

AI Confirmation Bias in Analytics

If we run code provided by an AI and it confidently produces a result that “makes sense”, we are less likely to catch errors in the AI’s generated responses.

We need to be very stringent when using AI generated code, double-checking everything the AI has created for statistical and programmatic validity.

Data Privacy and Security Risks When Using AI

The data we feed into GPT may be used to train future iterations of the AI model which can result in data leaks where our proprietary information becomes part of the publicly-available large language model. This poses serious data privacy and security risks if the information is sensitive or confidential.

Note: some AI models claim to not use information provided through the API for training. However, any time we are sending code and data to a third party we risk exposing that code and data.

Detailed vs Open-ended Prompts

Choosing between a more detailed or open-ended prompt depends on your specific task and constraints.

Use open-ended prompts for:

  • Brainstorming more creative, out-of-the-box ideas
  • Reducing the amount of time spent writing prompts
  • Exploring the data without constraints

Use detailed prompts for:

  • More precise, deterministic responses
  • Complex tasks requiring multiple steps
  • Reducing ambiguity
  • Controlling the output format like visualizations

Learn More on Codecademy