Sometimes, we want to group by more than one column. We can do this by passing multiple column names as arguments to the
Imagine that we run a chain of stores and have data about the number of sales at different locations on different days:
|West Village||February 1||W||400|
|West Village||February 2||Th||450|
We suspect that sales are different at different locations on different days of the week. In order to test this hypothesis, we could calculate the average sales for each store on each day of the week across multiple months. The code would look like this:
df %>% group_by(location,day_of_week) %>% summarize(mean_total_sales = mean(total_sales))
And the results might look something like this:
At ShoeFly.com, our Purchasing team thinks that certain
shoe_color combinations are particularly popular this year (for example, blue ballet flats are all the rage in Paris).
Find the total number of shoes of each
shoe_color combination purchased using
n(). Name the aggregate count column
count. Save your result to the variable
shoe_counts, and view it.
The Marketing team wants to better understand the different price levels of the kinds of shoes that have been sold on the website, in particular looking at
Find the mean price of each
shoe_material combination purchased using
mean(). Assign the name
mean_price to the calculated aggregate. Save your result to the variable
shoe_prices, and view it.
Don’t forget to include
na.rm = TRUE as an argument in the summary function that you call!