Codecademy Logo

Misleading and Confusing Graphs

Generative AI Tools for Data Visualization

Generative AI tools can be helpful for making data visualizations, but they often output confusing or misleading charts. Developers should know how to critically evaluate data visualizations output by Generative AI tools.

Developers can supercharge their coding abilities by leaning on Generative AI tools like Claude, ChatGPT, or other LLMS, but ultimately it is up to the developer to know whether a data visualization is accurate or not. Generative AI models can (and do) make mistakes and poor visual design choices. The best solution for developers is to equip themselves with the knowledge to spot these when they happen.

Data Visualization Design

Data visualizations rely on effective, informed design choices to avoid being unintentionally misleading or confusing. Selecting the right chart type, including thoughtful annotations and title, and making appropriate use of color will all help to make charts that communicate clearly and accurately.

Data Visualization Axes

Data visualizations need appropriate axes to be truthful and legible. This means avoiding decontextualized breaks and setting the right number of axis ticks – neither too few (numbers are hard to interpret) nor too many (axes are cluttered).

Two bar charts side by side. Both are titled "Event Attendance", with "Number of Attendees" on the y-axis and "Event Date" on the x-axis. Each chart has three bars (yellow, blue, orange) in increasing height from left to right. The difference is the y-scale on each chart. On the lefthand chart, the scale goes from 0 to 150 in intervals of 50. The bars show 100, 105, and 110, so are all clustered near the 100 line. The bars look relatively similar in height. On the righthand chart, the scale starts at 0 but has a break (shown by a zigzag in the axis). The numbers pick back up at 100 and increase by 5s, so the axis ticks are 0, 100, 105, 110. As such, the heights of bars representing those numbers (100, 105, 110) stretch over the whole vertical space of the graph. The bars look relatively much more different in height than in the left graph.

Data Visualization Scaling

Data visualizations need appropriate scaling to be truthful and legible. A linear scale (where numbers proceed by constant intervals) is almost always the best choice. Logarithmic scales (where numbers proceed exponentially) often cause confusion and should only be used with audiences who are very familiar with reading them.

Recall the example of Purdue pharmaceutical company using a misleading logarithmic scale to minimize the addiction risk of opioid painkillers.

Two line graphs side-by-side. The lefthand graph is titled "Painkiller prescribing information, Linear y-axis." The x-axis shows "Hours from dosing" from 0 to 12. The y-axis shows "Concentration of painkiller in bloodstream" from 0 to 140, at evenly-spaced intervals of 20. There are 5 lines representing different doses of the drug. The three lowest doses (10, 20, and 40 mg) are relatively similar and never reach above a concentration of 40 in the bloodstream. The two highest doses (80 and 160 mg) show significant spikes in concentration, reaching up to concentrations of 90 and 120 (2 or 3 times more than the lower doses). The righthand graph is titled "Painkiller prescribing information, Log y-axis." The x-axis shows "Hours from dosing" from 0 to 12. The y-axis shows "Concentration of painkiller in bloodstream" from 0 to 100 on a log scale. This means the axis runs from 0 to 100, with 10 about halfway between those two numbers. As such, all the lines appear more or less flattened out, and all of them are clustered nearer to the center of the graph. It's still obvious that higher doses result in a higher concentration in the bloodstream, but the big spike that is clear in the linear graph is completely invisible on the log scale.

Color Associations

In data visualizations, color associations pull on both helpful prior knowledge or harmful stereotypes. We tend to view darker colors as “more” and lighter colors as “less.” Color associations can also be culturally specific (for instance, red means “bad” or “stop” vs. red means “lucky” or “prosperous”) or influenced by the norms for a particular field (red means “negative financial balance”).

Color Palettes

When creating data visualizations, it’s essential to choose the right color palettes to ensure truthfulness, legibility, and accessibility. This involves correctly implementing sequential, diverging, or categorical color palettes and ensuring that there is proper color contrast in your visualizations.

Sequential color scale: light blue, medium blue, dark blue. Diverging color scale: orange, light warm gray, medium blue. Categorical color scale: orange, deep purple, light green.

Data Visualization Labels

Titles, labels, and annotations are essential for clear and accessible data visualizations. They provide context, making it easier for viewers to understand the chart’s contents and purpose.

Bias in Data Visualizations

Misleading charts often arise from conscious or unconscious bias. Following sound design principles in data visualization reduces the potential for bias. Clear labeling and unbiased data representation are key to maintaining integrity. A well-designed chart not only informs but also builds trust with the audience.

Learn more on Codecademy