Sometimes, we want to group by more than one column. We can do this by passing multiple column names as arguments to the group_by function.

Imagine that we run a chain of stores and have data about the number of sales at different locations on different days:

location date day_of_week total_sales
West Village February 1 W 400
West Village February 2 Th 450
Chelsea February 1 W 375
Chelsea February 2 Th 390

We suspect that sales are different at different locations on different days of the week. In order to test this hypothesis, we could calculate the average sales for each store on each day of the week across multiple months. The code would look like this:

df %>% group_by(location,day_of_week) %>% summarize(mean_total_sales = mean(total_sales))

And the results might look something like this:

location day_of_week mean_total_sales
Chelsea M 402.50
Chelsea Tu 422.75
Chelsea W 452.00
West Village M 390
West Village Tu 400



At ShoeFly.com, our Purchasing team thinks that certain shoe_type/shoe_color combinations are particularly popular this year (for example, blue ballet flats are all the rage in Paris).

Find the total number of shoes of each shoe_type/shoe_color combination purchased using group_by, summarize() and n(). Name the aggregate count column count. Save your result to the variable shoe_counts, and view it.


The Marketing team wants to better understand the different price levels of the kinds of shoes that have been sold on the website, in particular looking at shoe_type/shoe_material combinations.

Find the mean price of each shoe_type/shoe_material combination purchased using group_by, summarize() and mean(). Assign the name mean_price to the calculated aggregate. Save your result to the variable shoe_prices, and view it.

Don’t forget to include na.rm = TRUE as an argument in the summary function that you call!

Take this course for free

Mini Info Outline Icon
By signing up for Codecademy, you agree to Codecademy's Terms of Service & Privacy Policy.

Or sign up using:

Already have an account?