Box plots, also known as box-and-whisker plots, show the distribution of data by quartiles. Box plots are useful in showing how much a variable varies across values of another variable – are most cases similar in value, or is there a wide range between the highest and lowest values?
In the box plot below, we see the distribution of temperatures for different months within a subset of the
airquality dataset. As we would expect for New York City, the summer months have the highest temperatures. The center of the box represents the median temperature. The upper and lower bounds of the box show the 75th and 25th percentiles respectively. The whiskers extend up to 1.5 times the distance between the 75th and 25th percentiles. Beyond the whiskers, outliers are shown as points.
We can create a box plot using the
geom_boxplot() layer. The code below creates the box plot shown above, visualizing temperature by month in the
airquality_boxplot <- ggplot(airquality, aes(x = Month, y = Temp)) + labs(title = "Air Quality: Temperature by Month") + geom_boxplot()
Note that box plots show medians, not means. We’ll cover how to display mean values using bar plots later in this lesson.
Construct a box plot object called
rideshare_boxplot visualizing the cost of trips in
rideshare_df by month, using the
Month variables. In your
aes() mapping, transform
Month to a factor (
x = factor(Month)) so that
ggplot knows to treat each month as a discrete value, rather than a continuous number.
rideshare_boxplot object to see what it looks like. Notice what information is depicted in a box plot, compared to what would be included in a bar plot depicting the same data.