Column charts and pie charts are excellent for comparing the sizes of categories. But they won’t help us understand more general numeric columns. For example, suppose we wanted to use the following table of housing price data (in thousands of US dollars) to analyze the impact of wealth inequality on housing.
We’ll look at two methods for visualizing numeric data in this exercise: histograms and scatterplots. We’ll use histograms to visualize one column at a time, and scatterplots to understand the connection between the two columns.
Histograms are used to understand the “shape” of a single column of numeric data. A histogram breaks the data-points into bins and then counts the number of data points in each bin.
To better understand what a histogram shows, let’s visualize the median household income data for just the first 9 rows of data shown above:
The smallest and largest household incomes are 11.8 and 50.1. To create this chart, Excel broke that range into five equal pieces. The range of values for each piece is labelled along the x-axis. The column above each piece represents the number of data points that fall within that range. For example, the largest column has a height of 3, indicating that there are 3 data points in the table between 27.1 and 34.7 (not including 27.1). Check the table above to confirm that Excel got that right!
Scatterplots are used for visualizing correlation between two columns. As one column increases, does the other also increase? Decrease? Fluctuate? For example, in our dataset, we might want to know if higher-income households own more expensive houses: as income increases, does housing price also increase? It’s important to remember that the existence of a correlation like this does not mean the one column is directly influencing the other.
Here is what the scatterplot looks like using the first 9 rows of data shown above, with household income on the x-axis and housing price on the y-axis:
To create this chart, Excel has placed a dot for each row of the table. The dot is positioned horizontally by the income value in that row and vertically by the housing price for that row. For example, the first dot in the bottom left corresponds to the ninth row with an income of 11.8 and housing price of 561.
We can see a general trend from this plot: as incomes increase (moving to the right in the plot), so do housing prices (moving up in the plot.) We refer to this as a positive correlation when the variables trend in the same direction. If one variable increases while another decreases, this is referred to as a negative correlation.
We’ve placed a slideshow illustrating how to create these plots in Excel! When you’re ready to try it yourself, move on to the next set of instructions.
When you’re ready to practice what you’ve learned, download and work through our exercise spreadsheet. A couple of important points to keep in mind:
- Unlike formulas and pivot tables, we can’t automatically assess your visualizations. We have placed solutions to each exercise within the same spreadsheet. Feel free to compare your work to ours if you get stuck or to check if your solution is correct. Your visualization may not look identical to ours and that’s okay!
- There are lots of options we haven’t covered — feel free to play around and see how different options work!
- You can always re-download the spreadsheet if you want to start fresh.
Once you’ve finished, think about the next question.
Question: What can you say about the general shape of the household income histogram? What about the relationship between household income and housing price, and where the points are more densely scattered?