Each column of a data frame can hold items of the same data type. The data types that R uses are: character, numeric (real or decimal), integer, logical, or complex. Often, we want to convert between types so that we can do better analysis. If a numerical category like "num_users"
is stored as a vector of character
s instead of numeric
s, for example, it makes it more difficult to do something like make a line graph of users over time.
To see the types of each column of a data frame, we can use:
str(df)
str()
displays the internal structure of an R object. Calling str()
with a data frame as an argument will return a variety of information, including the data types. For a data frame like this:
item | price | calories |
---|---|---|
“banana” | “$1” | 105 |
“apple” | “$0.75” | 95 |
“peach” | “$3” | 55 |
“clementine” | “$2.5” | 35 |
the data types would be:
#> $ item: chr #> $ price: chr #> $ calories: num
We can see that the price
column is made up of character
s, which will probably make our analysis of price more difficult. We’ll look at how to convert columns into numeric values in the next few exercises.
Instructions
Let’s inspect the data types in the students
table.
Print out the structure of students
.
If we wanted to make a scatterplot of age
vs average exam score, would we be able to do it with this type of data?
Paste the following code in the last code block to try and print out the mean of the score
column of students
.
students %>% summarise(mean_score = mean(score))
What warning do you see?