There you have it! With the power of readr and dplyr in your hands, you can now:
- load data from a CSV into a data frame
- inspect the data frame with
head()
andsummary()
select()
the columns you want to analyzefilter()
the rows with comparison and logical operatorsarrange()
rows in ascending or descending order
You’ve also been exposed to the pipe %>%
, a powerful tool for chaining function calls, as well as the general principles of data manipulation.
Now that you are well on your way to being a dplyr master, let’s combine what you have learned together to perform an analysis and see the true power of the pipe!
Instructions
The code in notebook.Rmd
completes a sequence of steps:
- columns are selected from
artists
and saved tochosen_cols
chosen_cols
is filtered and saved topopular_not_hip_hop
popular_not_hip_hop
is arranged and saved toyoutube_desc
Notice that to arrive at this result, two intermediate variables chosen_cols
and popular_not_hip_hop
were created.
With the power of the pipe, we can clean up this code!
In the last code block, select()
all columns except country
,year_founded
, and albums
from artists
using the pipe %>%
. Save the result to artists
and view the head()
.
Place a pipe %>%
after the call to select()
. This will pipe your selection to the next line, where you should filter()
all rows where spotify_monthly_listeners
is greater than 20000000
and genre
is not equal to 'Hip Hop'
. Keep this data frame saved to artists
.
Place a pipe %>%
after the call to filter()
. This will pipe your filtered data frame to the next line, where you should arrange()
the rows in descending order by youtube_subscribers
. Keep this data frame saved to artists
.
Did you get the same result as the previous code block?
Make sure you’ve called head(artists)
to see the resulting data frame!