Let’s say we have a column called
"type" with data entries in the format
"user_Kenya", as shown in the table below.
Just like we saw before, this column actually contains two types of data. One seems to be the user type (with values like “admin” or “user”) and one seems to be the country this user is in (with values like “US” or “Kenya”).
We can no longer just split along the first 4 characters because
user are of different lengths. Instead, we know that we want to split along the
"_". We can thus use the tidyr function
separate() to split this column into two, separate columns:
# Create the 'user_type' and 'country' columns df %>% separate(type,c('user_type','country'),'_')
typeis the column to split
c('user_type','country')is a vector with the names of the two new columns
'_'is the character to split on
This would transform the table above into a table like:
students. Notice that the students’ names are stored in a column called
full_name column into two new columns,
last_name, by splitting on the
' ' character .
Provide as an extra argument to the
extra ='merge'. This will ensure that middle names or two-word last names will all end up in the
Save the result to
students, and view the