This lesson introduced you to aggregates in R using dplyr. You learned:
- How to calculate summary statistics with
summarize()
- How to perform aggregate statistics over individual rows with the same value or values using
group_by()
Instructions
Let’s examine some more data from ShoeFly.com. This time, in addition to the orders
data, we’ll be looking at data about user visits to the website, stored in the page_visits
data frame. Inspect the columns of the data frames using the rendered notebook.
Find the average price
of an order in the orders
data frame using summarize()
and the mean()
summary function. Save the resulting data frame to a variable named average_price
and view it.
Don’t forget to include na.rm = TRUE
as an argument in the call to mean()
!
In the page_visits
data frame, the column utm_source
contains information about how users got to ShoeFly’s homepage. For instance, if utm_source
= Facebook
, then the user came to ShoeFly by clicking on an ad on Facebook.com.
Use a group_by
statement to calculate how many visits came from each of the different sources. Save your answer to the variable click_source
, and view it.
Our Marketing department thinks that the traffic to our site has been changing over the past few months. Use group_by
to calculate the number of visits to our site from each utm_source
for each month
. Save your answer to the variable click_source_by_month
, and view it.