Heatmaps let us visualize frequencies along two variables. A heatmap looks like a scatterplot, but uses color-coded squares rather than individual points to indicate how many cases occurred at the intersection of x and y value ranges. Like histograms, we can specify bin widths to control which ranges of values get counted together.

Here’s an example heatmap using our airquality dataset, mapping Ozone values (ozone pollution) on the x axis and Solar.R values (solar radiation) on the y axis. Notice how each region in the heatmap is color-coded to the number of cases with values in the relevant bin ranges. In this dataset, there are many occurrences where both solar radiation levels and ozone levels are low.

Air Quality: Ozone and Solar Radiation

We can create heatmaps using the geom_bin2d() layer. This geom takes many of the same arguments as geom_histogram(), with slight differences given that a heatmap represents two variables. Like histograms, heatmaps automatically calculate 30 equally sized bins, which may not make sense for many datasets. To specify bin widths for each variable, we can pass a vector of widths instead of a single value, e.g. geom_bin2d(binwidth = c(1, 5)) to set x axis bin widths to 1 and y axis bin widths to 5.

The code below constructs the heatmap shown at the start of this exercise. We set the binwidth of both axes using a vector, specifying that each axis should use bins of width 25.

airquality_heatmap <- ggplot(airquality, aes(x = Ozone, y = Solar.R)) + labs(title = "Air Quality: Ozone and Solar Radiation") + geom_bin2d(binwidth = c(25, 25))



Our rideshare_df dataset contains information on the longitude (east-west) and latitude (north-south) where each ride began. We can use a heatmap to visualize the locations where people most frequently began their trips. Create a heatmap named rideshare_heatmap, mapping Pickup.Centroid.Longitude to the x axis and Pickup.Centroid.Latitude to the y axis.

Print rideshare_heatmap to see what it looks like. Notice how the plot looks – how can we improve the way this data is being communicated?


To visualize this information more clearly, let’s create a similar plot called rideshare_heatmap_binwidth with more appropriate bin sizes. Use the same data and x and y variable mappings. This time, set the binwidth to 0.01 for both axes. You should see a denser heatmap with more bins filled in.

Print rideshare_heatmap_binwidth to see what it looks like with our specified bin sizes. The region with no values to the right side of the map is Lake Michigan, where it is impossible to pick up a ride!

Sign up to start coding

Mini Info Outline Icon
By signing up for Codecademy, you agree to Codecademy's Terms of Service & Privacy Policy.

Or sign up using:

Already have an account?