Learn

Histograms let us visualize the distribution of a continuous variable, in contrast to bar plots which show counts and other values for discrete and categorical variables. Histograms divide values of a variable into bins, which are ranges of values that get counted together. For example, if a variable had values `1` through `100` and we specify that we want 5 bins, each bin would have a range of `100 / 5 = 20`. The first bin would count the frequency of values `1` to `20`, the second bin would count the frequency of values `21` to `40`, and so on.

We can construct a histogram using `geom_histogram()`. The code below creates a histogram using R’s built-in `airquality` dataset containing atmospheric measurements from New York City. This histogram shows frequencies of `Ozone` values, measuring the amount of air pollution recorded within a given time period.

``````airquality_histogram <-
ggplot(airquality, aes(x = Ozone)) +
labs(title = "Air Quality: Ozone Distribution") +
geom_histogram()``````

This produces the following plot. We see that ozone levels are clustered towards the lower end of the range (a good thing!), though there were days with much higher ozone levels as well. By default, `ggplot2` automatically calculates 30 equally sized bins. Frequently we’ll want to specify a range per bin that better fits our data; for example, if we wanted to examine the distribution of weight in pounds for a population of house cats, it would make sense for each bin to represent one pound rather than some arbitrary decimal amount. We can set the width of bins using the `binwidth` argument. The code below creates the same plot as before, now with a `binwidth` of `10`.

``````airquality_histogram_binwidth <-
ggplot(airquality, aes(x = Ozone)) +
labs(title = "Air Quality: Ozone Distribution") +
geom_histogram(binwidth = 10)``````

Take a look at our new plot with `binwidth` set to `10`. Notice how the shape of the histogram is now more smooth with fewer local peaks. ### Instructions

1.

Our workspace contains a dataset called `rideshare_df` describing rideshare trips in the city of Chicago. Examine this dataset by calling the `head()` function with `rideshare_df` as an argument. Click through the arrows in the table header to see all of the columns in this data frame.

2.

Lets visualize the distribution of total trip cost across the `rideshare_df` dataset. Construct a histogram called `rideshare_histogram` with the `Trip.Total` variable on the `x` axis. Note that we only supply the `x` variable in our `aes()` mapping because the `y` axis will automatically show frequency counts.

Print your plot after creating it to see what it looks like!

3.

Create a similar plot called `rideshare_histogram_binwidth`, this time setting the `binwidth` to `5` to count trip totals in intervals of \$5.

Print your plot again to see how it looks with our custom bin width.