All design choices impact how a viewer will understand a data visualization. Even the simplest visualizations have an argument, a thesis, or a central point — and the design choices we make (or ignore) can have a positive or negative effect on getting that point across.
For the goal of creating more readable and understandable visualizations, there are some simple, effective tools at our disposal in matplotlib. Here are 6 strategies we’ll learn for making a strong, clear visual argument:
- choose the right chart
- use subplots to compare multiple graphs
- remove distracting lines (i.e., chartjunk)
- use color for emphasis
- add annotations to the graph
- present the graph with context
In this lesson, we’ll work with a dataset that catalogs trees around the Tapajós River, a tributary of the Amazon River that runs through the Amazon Rainforest. Some preliminary data manipulation has been done for you to aggregate and organize the data for our purposes. (This is a crucial step in most data visualization processes, and a great reason to become familiar with the pandas
library! You can check out the other notebook in this folder if you want to see how we organized the data using pandas
.) Use the Jupyter notebook to the right to explore the data, and then we’ll dive into making some visualizations in the next exercise!
Instructions
Run the Setup cells above to import the necessary packages and load our datasets. Then, in the cell below, type data.head()
and run the cell to preview the first five lines of the full dataset.
Type in avg_heights
and run the cell to see the whole avg_heights
dataset and compare the two datasets. What do they have in common, and how are they different?