Codecademy Logo

Learn How to Use AI for Data Analysis

Prompt Engineering for Large Language Models

Prompts are what we send to a large language model, such as a request for the model to produce analytics code.

Prompt Engineering is the art and science of designing prompts to produce the most effective responses from AI large language models.

Using LLMs for Analytics

Data analysts can prompt large language models to

  1. Produce code for statistical analysis
  2. Create visualizations
  3. Debugging code
  4. Brainstorming analytics ideas

Anatomy of an Effective Prompt

When prompting large language models to produce code for analytics, we can use the following principles to engineer effect prompts:

  1. Set the overall goal of the prompt
  2. Establish the current code state (programming language and libraries)
  3. Describe the dataset and relevant columns
  4. Outline what the code should achieve in the final output
  5. Provide important details

Debug Coding Errors Using LLMs

When prompting large language models to identify and debug errors in your code, we can include important context such as:

  • The goal of debugging the error
  • Descriptions of the dataset and other relevant information
  • The code and the error message

Brainstorm Ideas Using LLMs

When prompting large language models to brainstorm analytics ideas we can use the following strategies:

  • use a prompt with a few details about the dataset to do high-level analysis that may provide creative, out-of-the-box responses
  • use a prompt that contains many specific details about the dataset like a list of the column names to make the responses more relevant to our task

Hallucinations in Analytics

Large language models use probability to generate responses to prompts. They don’t actually know or understand what they are saying, and so it is common for them to hallucinate, or generate false information.

Some examples of hallucinations when analyzing data include:

  • Making up a fake Python library
  • Referencing or interpreting the data incorrectly
  • Suggesting an analysis that violates statistical assumptions

Confirmation Bias in Analytics

If we run code provided by a large language model and it confidently produces a result that “makes sense”, we are less likely to catch errors in the AI’s generated responses.

We need to be very stringent when using AI-generated code, double-checking everything the AI has created for statistical and programmatic validity.

Data Privacy and Security Risks When Using LLMs

The data we feed into large language models may be used to train future iterations of the AI model which can result in data leaks where our proprietary information becomes part of the publicly-available large language model. This poses serious data privacy and security risks if the information is sensitive or confidential.

Note: some AI models claim to not use the information we provide them through prompting. However, any time we send code and data to a third party, we risk exposing that information.

Detailed vs Open-ended Prompts

Choosing between a more detailed or open-ended prompt depends on your specific task and constraints.

Use open-ended prompts for:

  • Brainstorming more creative, out-of-the-box ideas
  • Reducing the amount of time spent writing prompts
  • Exploring the data without constraints

Use detailed prompts for:

  • More precise, deterministic responses
  • Complex tasks requiring multiple steps
  • Reducing ambiguity
  • Controlling the output format like visualizations

Prompting LLMs with Multiple Iterations

If you are not satisfied with a response generated by a large language model, consider multiple prompting iterations by:

  • Experimenting with both open-ended and goal-oriented prompts to find the right balance
  • Adding more context at each iteration

Multiple iterations allow you to test out different prompts until you find a solution that satisfies your own style as a data analyst!

Learn More on Codecademy