Learn

Sometimes we need to modify strings in our data frames to help us transform them into more meaningful metrics. For example, in our fruits table from before:

item price calories
“banana” “\$1” 105
“apple” “\$0.75” 95
“peach” “\$3” 55
“peach” “\$4” 55
“clementine” “\$2.5” 35

We can see that the `'price'` column is actually composed of character strings representing dollar amounts. This column could be much better represented as numeric, so that we could take the mean, calculate other aggregate statistics, or compare different fruits to one another in terms of price.

First, we can use a regular expression, a sequence of characters that describe a pattern of text to be matched, to remove all of the dollar signs. The base R function `gsub()` will remove the `\$` from the `price` column, replacing the symbol with an empty string `''`:

``````fruit %>%
mutate(price=gsub('\\\$','',price))``````

Then, we can use the base R function `as.numeric()` to convert character strings containing numerical values to numeric:

``````fruit %>%
mutate(price = as.numeric(price))``````

Now, we have a data frame that looks like:

item price calories
“banana” 1 105
“apple” 0.75 95
“peach” 3 55
“peach” 4 55
“clementine” 2.5 35

### Instructions

1.

We saw in the last exercise that finding the mean of the `score` column is hard to do when the data is stored as `character`s and not numbers.

View the `head()` of `students` to take a look at the values in the `score` column.

2.

Remove the `'%'` symbol from the `score` column, and save the resulting data frame to `students`. View `students`.

3.

Convert the `score` column to a numerical type using the `as.numeric()` function. Save this new data frame to `students`, and view it.