Heatmaps let us visualize frequencies along two variables. A heatmap looks like a scatterplot, but uses color-coded squares rather than individual points to indicate how many cases occurred at the intersection of x
and y
value ranges. Like histograms, we can specify bin widths to control which ranges of values get counted together.
Here’s an example heatmap using our airquality
dataset, mapping Ozone
values (ozone pollution) on the x
axis and Solar.R
values (solar radiation) on the y
axis. Notice how each region in the heatmap is color-coded to the number of cases with values in the relevant bin ranges. In this dataset, there are many occurrences where both solar radiation levels and ozone levels are low.
We can create heatmaps using the geom_bin2d()
layer. This geom takes many of the same arguments as geom_histogram()
, with slight differences given that a heatmap represents two variables. Like histograms, heatmaps automatically calculate 30 equally sized bins, which may not make sense for many datasets. To specify bin widths for each variable, we can pass a vector of widths instead of a single value, e.g. geom_bin2d(binwidth = c(1, 5))
to set x
axis bin widths to 1
and y
axis bin widths to 5
.
The code below constructs the heatmap shown at the start of this exercise. We set the binwidth
of both axes using a vector, specifying that each axis should use bins of width 25
.
airquality_heatmap <- ggplot(airquality, aes(x = Ozone, y = Solar.R)) + labs(title = "Air Quality: Ozone and Solar Radiation") + geom_bin2d(binwidth = c(25, 25))
Instructions
Our rideshare_df
dataset contains information on the longitude (east-west) and latitude (north-south) where each ride began. We can use a heatmap to visualize the locations where people most frequently began their trips. Create a heatmap named rideshare_heatmap
, mapping Pickup.Centroid.Longitude
to the x
axis and Pickup.Centroid.Latitude
to the y
axis.
Print rideshare_heatmap
to see what it looks like. Notice how the plot looks – how can we improve the way this data is being communicated?
To visualize this information more clearly, let’s create a similar plot called rideshare_heatmap_binwidth
with more appropriate bin sizes. Use the same data and x
and y
variable mappings. This time, set the binwidth
to 0.01
for both axes. You should see a denser heatmap with more bins filled in.
Print rideshare_heatmap_binwidth
to see what it looks like with our specified bin sizes. The region with no values to the right side of the map is Lake Michigan, where it is impossible to pick up a ride!