Often, you have the same data separated out into multiple files.
Let’s say that you have a ton of files following the filename structure:
'file_3.csv', and so on. The power of dplyr and tidyr is mainly in being able to manipulate large amounts of structured data, so you want to be able to get all of the relevant information into one table so that you can analyze the aggregate data.
You can combine the base R functions
lapply() with readr and dplyr to organize this data better, as shown below:
files <- list.files(pattern = "file_.*csv") df_list <- lapply(files,read_csv) df <- bind_rows(df_list)
- The first line uses
list.files()and a regular expression, a sequence of characters describing a pattern of text that should be matched, to find any file in the current directory that starts with
'file_'and has an extension of
csv, storing the name of each file in a vector
- The second line uses
lapply()to read each file in
filesinto a data frame with
read_csv(), storing the data frames in
- The third line then concatenates all of those data frames together with dplyr’s
You have 10 different files containing 100 students each. These files follow the naming structure:
- … up to
You are going to read each file into an individual data frame and then combine all of the entries into one data frame.
First, create a variable called
student_files and set it equal to the
list.files() of all of the CSV files we want to import.
Read each file in
student_files into a data frame using
lapply() and save the result to
Concatenate all of the data frames in
df_list into one data frame called
students. Save the number of rows in
nrow_students. Did you get all of them?