Prompts are what we send to a large language model, such as a request for the model to produce analytics code.
Prompt Engineering is the art and science of designing prompts to produce the most effective responses from AI large language models.
An Application Programming Interface, or API, is a tool that makes it easier for developers to access software from another application.
Some third-party APIs require an API key which is a special token that is given to a developer to interact with the API. These API keys are unique and should be kept secret.
OpenAI provides several different large language models that can be selected with their API using the model
parameter in the openai.ChatCompletion()
method.
As of 2023, OpenAI’s newest GPT models include gpt-4
and gpt-3.5-turbo
. An updated list of their GPT models and their API pricing can be found here.
The OpenAI API can be used to prompt large language models and adjust parameters controlling the model behavior.
Three common pieces of information to send using the API are:
The get_response()
helper function is used to send a prompt with the following settings to OpenAI’s GPT language model:
The language model’s generated response in response['choices'][0]['message']['content']
is returned.
def get_response(prompt, system = '', temperature = 1.0):response = openai.ChatCompletion.create(model = "gpt-3.5-turbo",messages = [{"role": "user", "content": prompt},{"role": "system", "content": system}],temperature = temperature)return response['choices'][0]['message']['content']
GPT’s responses from the OpenAI API can be formatted into Markdown using two special Jupyter Notebook functions: display
and Markdown
. These display Python code in a clean format by providing syntax highlighting, indentations, and easy copy + pasting.
from IPython.display import display, Markdowndisplay(Markdown(gpt_response))
Data analysts can prompt OpenAI’s large language models to
When prompting OpenAI’s large language models to produce code for analytics, we can use the following principles to engineer effect prompts:
Using OpenAI’s large language models to identify and debug errors in your code involves setting a behavior-specific system prompt and a message prompt that
system_prompt = '''You are a helpful AI assistant for debugging Python data visualizations.Given a code snippet that generates a visualization, you will:1. Identify and fix any errors.2. Provide bulleted code explanations of the changes made.3. Suggest improvements to enhance the effectiveness of the visualization.'''
When using GPT to brainstorm ideas for analytics, it may be useful to change the following settings in our prompt setup:
GPT (and other AI language models) use probability to respond to prompts. They don’t actually know or understand what they are saying, and so it is common for them to “hallucinate”, or generate false information.
For example, GPT could make up a Python library that doesn’t actually exist.
If we run code provided by an AI and it confidently produces a result that “makes sense”, we are less likely to catch errors in the AI’s generated responses.
We need to be very stringent when using AI generated code, double-checking everything the AI has created for statistical and programmatic validity.
The data we feed into GPT may be used to train future iterations of the AI model which can result in data leaks where our proprietary information becomes part of the publicly-available large language model. This poses serious data privacy and security risks if the information is sensitive or confidential.
Note: some AI models claim to not use information provided through the API for training. However, any time we are sending code and data to a third party we risk exposing that code and data.
Choosing between a more detailed or open-ended prompt depends on your specific task and constraints.
Use open-ended prompts for:
Use detailed prompts for: