Sometimes, we want to group by more than one column. We can do this by passing multiple column names as arguments to the group_by
function.
Imagine that we run a chain of stores and have data about the number of sales at different locations on different days:
location | date | day_of_week | total_sales |
---|---|---|---|
West Village | February 1 | W | 400 |
West Village | February 2 | Th | 450 |
Chelsea | February 1 | W | 375 |
Chelsea | February 2 | Th | 390 |
… | … | … | … |
We suspect that sales are different at different locations on different days of the week. In order to test this hypothesis, we could calculate the average sales for each store on each day of the week across multiple months. The code would look like this:
df %>% group_by(location,day_of_week) %>% summarize(mean_total_sales = mean(total_sales))
And the results might look something like this:
location | day_of_week | mean_total_sales |
---|---|---|
Chelsea | M | 402.50 |
Chelsea | Tu | 422.75 |
Chelsea | W | 452.00 |
… | … | … |
West Village | M | 390 |
West Village | Tu | 400 |
… | … | … |
Instructions
At ShoeFly.com, our Purchasing team thinks that certain shoe_type
/shoe_color
combinations are particularly popular this year (for example, blue ballet flats are all the rage in Paris).
Find the total number of shoes of each shoe_type
/shoe_color
combination purchased using group_by
, summarize()
and n()
. Name the aggregate count column count
. Save your result to the variable shoe_counts
, and view it.
The Marketing team wants to better understand the different price levels of the kinds of shoes that have been sold on the website, in particular looking at shoe_type
/shoe_material
combinations.
Find the mean price of each shoe_type
/shoe_material
combination purchased using group_by
, summarize()
and mean()
. Assign the name mean_price
to the calculated aggregate. Save your result to the variable shoe_prices
, and view it.
Don’t forget to include na.rm = TRUE
as an argument in the summary function that you call!