Sometimes we need to modify strings in our data frames to help us transform them into more meaningful metrics. For example, in our fruits table from before:
item | price | calories |
---|---|---|
“banana” | “$1” | 105 |
“apple” | “$0.75” | 95 |
“peach” | “$3” | 55 |
“peach” | “$4” | 55 |
“clementine” | “$2.5” | 35 |
We can see that the 'price'
column is actually composed of character strings representing dollar amounts. This column could be much better represented as numeric, so that we could take the mean, calculate other aggregate statistics, or compare different fruits to one another in terms of price.
First, we can use a regular expression, a sequence of characters that describe a pattern of text to be matched, to remove all of the dollar signs. The base R function gsub()
will remove the $
from the price
column, replacing the symbol with an empty string ''
:
fruit %>% mutate(price=gsub('\\$','',price))
Then, we can use the base R function as.numeric()
to convert character strings containing numerical values to numeric:
fruit %>% mutate(price = as.numeric(price))
Now, we have a data frame that looks like:
item | price | calories |
---|---|---|
“banana” | 1 | 105 |
“apple” | 0.75 | 95 |
“peach” | 3 | 55 |
“peach” | 4 | 55 |
“clementine” | 2.5 | 35 |
Instructions
We saw in the last exercise that finding the mean of the score
column is hard to do when the data is stored as character
s and not numbers.
View the head()
of students
to take a look at the values in the score
column.
Remove the '%'
symbol from the score
column, and save the resulting data frame to students
. View students
.
Convert the score
column to a numerical type using the as.numeric()
function. Save this new data frame to students
, and view it.