Suppose you have a data frame called customers, which contains the ages of your business’s customers:

name age gender
Rebecca Erikson 35 F
Thomas Roberson 28 M
Diane Ochoa 42 NA

For your analysis, you only care about the age and gender of your customers, not their names. The data frame you want looks like this:

age gender
35 F
28 M
42 NA

You can select the appropriate columns for your analysis using dplyr‘s select() function:

  • select() takes a data frame as its first argument
  • all additional arguments are the desired columns to select
  • select() returns a new data frame containing only the desired columns

But what about the pipe %>%, you ask? Great question. You can simplify the readability of your code by using the pipe:

customers %>% select(age,gender)

When using the pipe, you can read the code as: from the customers table, select() the age and gender columns. From now on we will use the pipe symbol where appropriate to simplify our code.



Select the group column of artists using select() and save the result to artist_groups. View artist_groups.


Select the group, spotify_monthly_listeners, and year_founded columns of artists using select() and save the result to group_info. View group_info.

Take this course for free

Mini Info Outline Icon
By signing up for Codecademy, you agree to Codecademy's Terms of Service & Privacy Policy.

Or sign up using:

Already have an account?