group_by()
can also be used with the dplyr function mutate()
to add columns to a data frame that involve per-group metrics.
Consider the same educational technology company’s enrollments
table from the previous exercise:
user_id | course | quiz_score |
---|---|---|
1234 | learn_r | 80 |
1234 | learn_python | 95 |
4567 | learn_r | 90 |
4567 | learn_python | 55 |
You want to add a new column to the data frame that stores the difference between a row’s quiz_score
and the average quiz_score
for that row’s course
. To add the column:
enrollments %>% group_by(course) %>% mutate(diff_from_course_mean = quiz_score - mean(quiz_score))
group_by()
groups the data frame by course into two groups:learn-r
andlearn-python
mutate()
will add a new columndiff_from_course_mean
which is calculated as the difference between a row’s individualquiz_score
and themean(quiz_score)
for that row’s group (course)
The resulting data frame would look like this:
user_id | course | quiz_score | diff_from_course_mean |
---|---|---|---|
1234 | learn_r | 80 | -5 |
1234 | learn_python | 95 | 20 |
4567 | learn_r | 90 | 5 |
4567 | learn_python | 55 | -20 |
- The average
quiz_score
for thelearn-r
course is85
, sodiff_from_course_mean
is calculated asquiz_score - 85
for all the rows ofenrollments
with a value oflearn-r
in thecourse
column. - The average
quiz_score
for thelearn-python
course is75
, sodiff_from_course_mean
is calculated asquiz_score - 75
for all the rows ofenrollments
with a value oflearn-python
in thecourse
column.
Instructions
You want to be able to tell how expensive each order is compared to the average price
of orders with the same shoe_type
.
Group orders
by shoe_type
and create a new column named diff_from_shoe_type_mean
that stores the difference in price between an orders price
and the average price
of orders with the same shoe_type
. Save the result to diff_from_mean
, and view it.
Don’t forget to include na.rm = TRUE
as an argument in the summary function you call!