Heatmaps let us visualize frequencies along two variables. A heatmap looks like a scatterplot, but uses color-coded squares rather than individual points to indicate how many cases occurred at the intersection of
y value ranges. Like histograms, we can specify bin widths to control which ranges of values get counted together.
Here’s an example heatmap using our
airquality dataset, mapping
Ozone values (ozone pollution) on the
x axis and
Solar.R values (solar radiation) on the
y axis. Notice how each region in the heatmap is color-coded to the number of cases with values in the relevant bin ranges. In this dataset, there are many occurrences where both solar radiation levels and ozone levels are low.
We can create heatmaps using the
geom_bin2d() layer. This geom takes many of the same arguments as
geom_histogram(), with slight differences given that a heatmap represents two variables. Like histograms, heatmaps automatically calculate 30 equally sized bins, which may not make sense for many datasets. To specify bin widths for each variable, we can pass a vector of widths instead of a single value, e.g.
geom_bin2d(binwidth = c(1, 5)) to set
x axis bin widths to
y axis bin widths to
The code below constructs the heatmap shown at the start of this exercise. We set the
binwidth of both axes using a vector, specifying that each axis should use bins of width
airquality_heatmap <- ggplot(airquality, aes(x = Ozone, y = Solar.R)) + labs(title = "Air Quality: Ozone and Solar Radiation") + geom_bin2d(binwidth = c(25, 25))
rideshare_df dataset contains information on the longitude (east-west) and latitude (north-south) where each ride began. We can use a heatmap to visualize the locations where people most frequently began their trips. Create a heatmap named
Pickup.Centroid.Longitude to the
x axis and
Pickup.Centroid.Latitude to the
rideshare_heatmap to see what it looks like. Notice how the plot looks – how can we improve the way this data is being communicated?
To visualize this information more clearly, let’s create a similar plot called
rideshare_heatmap_binwidth with more appropriate bin sizes. Use the same data and
y variable mappings. This time, set the
0.01 for both axes. You should see a denser heatmap with more bins filled in.
rideshare_heatmap_binwidth to see what it looks like with our specified bin sizes. The region with no values to the right side of the map is Lake Michigan, where it is impossible to pick up a ride!