Sometimes we need to modify strings in our data frames to help us transform them into more meaningful metrics. For example, in our fruits table from before:
We can see that the
'price' column is actually composed of character strings representing dollar amounts. This column could be much better represented as numeric, so that we could take the mean, calculate other aggregate statistics, or compare different fruits to one another in terms of price.
First, we can use a regular expression, a sequence of characters that describe a pattern of text to be matched, to remove all of the dollar signs. The base R function
gsub() will remove the
$ from the
price column, replacing the symbol with an empty string
fruit %>% mutate(price=gsub('\\$','',price))
Then, we can use the base R function
as.numeric() to convert character strings containing numerical values to numeric:
fruit %>% mutate(price = as.numeric(price))
Now, we have a data frame that looks like:
We saw in the last exercise that finding the mean of the
score column is hard to do when the data is stored as
characters and not numbers.
students to take a look at the values in the
'%' symbol from the
score column, and save the resulting data frame to
score column to a numerical type using the
as.numeric() function. Save this new data frame to
students, and view it.