Codecademy Logo

Visualizing Data for Impact: Analyzing Misleading Visualizations

Effective Chart Type

Data visualizations should be formatted with an effective chart type. The chart type should create clarity, not confusion. The visual format (bar, pie, line, scatter plot, etc.) should help convey the data insight.

In this example, a grouped bar chart effectively compares customer satisfaction ratings across categories. With this visual format, the ratings for “In-Flight Comfort” and “Booking Process” are next to each other, so we can quickly spot the differences. We can spot that most of the Booking Process ratings are 4s and 5s. By contrast, most of the In-Flight comfort ratings are 1s and 2s.

Effective use of a grouped bar chart. The chart is called “Best Deal Airlines: Booking vs. Comfort Ratings. The x-axis displays ratings from 1 to 5 and the y-axis shows the percentage of responses. For each rating, there are two columns: a blue column for Booking Process and an orange column for In-Flight comfort. It’s easy to compare the ratings for these categories. The Booking Process has a high percentage of 4 and 5 ratings. By contrast, In-Flight comfort has a high percentage of 1 and 2 ratings.

Data Omission

Data omission often results in misleading charts. Omitting data means leaving out key information, cherry-picking data points, or using a truncated scale. A chart must include enough data so viewers can draw an accurate conclusion.

For example, Best Deal Airlines made a line chart to demonstrate the change in flight crew salaries. It appears that salaries have steadily increased from 2015 to 2023.

However, essential data has been omitted! Flight crews consist of pilots and flight attendants. Unlike pilot salaries, flight attendant salaries have been stagnant for three years. Best Deal Airlines omitted flight attendant salaries to fabricate a more favorable picture of their payment practices.

Misleading line graph that omits data. The graph is called “Best Deal Airlines: Flight Crew Salaries Over Time.” The x-axis shows years from 2015-2023. The y-axis displays salary in dollars. A green line shows the change in salary, which steadily increases over the years. This is untruthful because the line only represents pilot salaries. Flight attendants are also part of the flight crew, but their salary data was intentionally left out.

Truncated Graph

Truncated graphs are generally misleading. Data points should not be left out to fit a desired narrative or present a skewed picture. As a general rule, axes should start at zero and numbers should not be skipped.

For example, Best Deal Airlines made a chart to capture ticket sale data. The line chart shows a dramatic increase in ticket sales, with 2024 showing a record high for the company.

Looking closely, we’ll notice that the graph is truncated: the first year on the x-axis is 2020. Due to travel restrictions during the Covid-19 pandemic, 2020 had extremely low sales. Thus, the years after 2020 look quite favorable by comparison. Truncating the scale exaggerates the success of sales in recent years.

Misleading line graph with a truncated scale. It’s called “Best Deal Airlines: Annual Ticket Sales.” The x-axis only displays the years from 2020 to 2024, and the y-axis shows sales in millions of dollars. In 2020, sales were less than 100 million dollars but increased dramatically. By 2024, sales were close to 250 million dollars.

Scale Manipulation

Scale manipulation is a common tactic for creating misleading data visualizations. A graph will be inaccurate if the scale is inconsistent, truncated, or unsuitable for the context.

For example, Best Deal Airlines is publishing a report about changes in airplane seat width. The graph shows a subtle downward slope, implying only a slight decrease over time.

You may have noticed something odd with the scale: seat width is measured in feet. For this context, the scale should use smaller units like inches or centimeters. Relatively small changes to seat size make a big difference to the passenger. The seats have shrunk nearly six inches, which is a significant loss of space.

Line graph with inappropriate scale. The graph is called “Best Deal Airlines: Plane Seat Width Over Time.” The x-axis shows years from 2016-2024 and the y-axis shows seat width in feet. With the scale in feet, it appears that there is only a slight decrease in seat width over time. Seats were two feet wide in 2016 and 1.5 feet wide in 2024, which is a significant loss of space.

Appropriate Color Palettes

Appropriate color palettes should be applied to data visualizations. Color should be used consistently across multiple visualizations and meet accessibility standards for colorblindness. Color choices should align with the target audience’s color associations without reinforcing biases or judgments.

The example heatmap ignores a common convention for applying gradients. In general, we associate light tones with “less” and saturated tones with “more.”

The deep blue squares appear to be the busiest months for air travel. However, the legend reveals that the opposite is true! Because this heat map’s colors subvert our expectations, viewers may be confused or draw the wrong conclusions.

Heatmap with a confusing color scheme. The chart is called “Best Deal Airlines: Monthly Sales in Major U.S. Hubs” The x-axis has months of the year, and the y-axis lists hubs like Atlanta, Detroit, and Orlando. The map shows the concentration of sales in shades of blue. The colors are confusing because the darkest shade represents the lowest sales while the lightest shade represents the highest sales.

Clear, Accurate, Unbiased Context

When adding context to data visualizations, consider what to say and how to say it. Use neutral language that is clear, unbiased, accurate, and accessible to your audience.

This graph from Best Deal Airlines shows customer comfort ratings based on ticket status and flight length.

Terms like “ruby” and “sapphire” are unconventional descriptions for airline ticket status. If this graph is for Best Deal’s employees, the labels are okay, because internal teams are likely familiar with these terms. However, if the chart is meant for a wider audience, the labels for ticket status may cause confusion. Additionally, the annotation uses the term “cheap seats” in a way that feels judgmental and unhelpful.

Line chart with confusing labels. The chart is called “Flight Duration vs. Average Flight Comfort.” Four lines represent the comfort ratings based on ticket status. The legend uses unfamiliar language to define the ticket types: Diamond, Emerald, Ruby, and Sapphire. The “Diamond” and “Ruby” lines show an increase in comfort with longer flights. The “Ruby” and “Sapphire” lines decrease in comfort with longer flights. There’s an annotation that says: For a comfortable flight, cheap seats are not the way to go.”

Graphical Complexity

Graphical complexity occurs when too much information is crammed into one chart. Viewers may be unable to process the takeaways, so an overly complex chart is ineffective.

For example, Best Deal Airlines wants to compare its prices with top competitors. This graph includes 14 airlines that are organized alphabetically rather than by price. The graph also color codes the airline as budget, standard, or luxury. With so many details on one graph, we can’t easily observe how Best Deal’s prices compare with other companies.

While it’s not necessarily wrong to include so many airlines in one graph, it distracts from the insight we’re trying to convey. To compare Best Deal with its top competitors, we should focus on budget airlines. This will cut down on the details and make it easier to process the information.

Ineffective bar chart that is overly complex. It’s called “Best Deal vs. Top Airlines: Avg. Ticket Price for Domestic Flights.” The x-axis shows price in dollars and the y-axis lists 14 airlines. There’s a horizontal bar for each airline: blue bars are budget airlines, pink bars are luxury airlines, and green bars are standard airlines. The airlines are displayed in alphabetical order, so it’s not easy to determine the most expensive vs. least expensive. It’s hard to see how Best Deal compares to other airlines.

Generative AI Tools for Data Visualization

Generative AI tools can be helpful for making data visualizations, but they often output confusing or misleading charts. Developers should know how to critically evaluate data visualizations output by Generative AI tools.

Developers can supercharge their coding abilities by leaning on Generative AI tools like LLMs Claude.ai or ChatGPT, but ultimately it is up to the developer to know whether a data visualization is accurate or not. Generative AI models can (and do) make mistakes and poor visual design choices, and the best solution for developers is to equip themselves with the knowledge to spot these when they happen.

OpenAI API Prompt Engineering

The process of prompt engineering involves creating input prompts specifically designed to generate the desired and optimal output from large language models. The effectiveness of prompt engineering relies on crafting input prompts that are both descriptive and token-efficient. This can be achieved by using various strategies, whether creating a single input prompt with either endpoint or employing few-shot prompting with the chat/completions endpoint. Here are a few recommended approaches for creating effective prompts:

  • Be Descriptive: Utilize adjectives and descriptive language in your prompts to provide the model with more contextual information, aiding it in generating the desired output.
  • Be Specific: Avoid using vague terms such as “a few” and instead provide precise details, such as specifying “three” to enhance the accuracy and clarity of the model’s output.
  • Define the Output: Request the output to be structured in a specific format, such as JSON, or provide clear instructions to ensure the model generates the output in the desired format.
  • Provide an Example: An example of the desired end result can help guide the model and provide a clearer understanding of your expected output.

By employing these prompt engineering strategies, users can enhance the performance of large language models and obtain more reliable and targeted outputs.

Learn more on Codecademy