Learn

Box plots, also known as box-and-whisker plots, show the distribution of data by quartiles. Box plots are useful in showing how much a variable varies across values of another variable – are most cases similar in value, or is there a wide range between the highest and lowest values?

In the box plot below, we see the distribution of temperatures for different months within a subset of the `airquality` dataset. As we would expect for New York City, the summer months have the highest temperatures. The center of the box represents the median temperature. The upper and lower bounds of the box show the 75th and 25th percentiles respectively. The whiskers extend up to 1.5 times the distance between the 75th and 25th percentiles. Beyond the whiskers, outliers are shown as points. We can create a box plot using the `geom_boxplot()` layer. The code below creates the box plot shown above, visualizing temperature by month in the `airquality` data.

``````airquality_boxplot <-
ggplot(airquality,
aes(x = Month, y = Temp)) +
labs(title = "Air Quality: Temperature by Month") +
geom_boxplot()``````

Note that box plots show medians, not means. We’ll cover how to display mean values using bar plots later in this lesson.

### Instructions

1.

Construct a box plot object called `rideshare_boxplot` visualizing the cost of trips in `rideshare_df` by month, using the `Trip.Total` and `Month` variables. In your `aes()` mapping, transform `Month` to a factor (`x = factor(Month)`) so that `ggplot` knows to treat each month as a discrete value, rather than a continuous number.

Print the `rideshare_boxplot` object to see what it looks like. Notice what information is depicted in a box plot, compared to what would be included in a bar plot depicting the same data.