Since we want
- Each variable as a separate column
- Each row as a separate observation
We would want to reshape a table like:
Account | Checking | Savings |
---|---|---|
“12456543” | 8500 | 8900 |
“12283942” | 6410 | 8020 |
“12839485” | 78000 | 92000 |
Into a table that looks more like:
Account | Account Type | Amount |
---|---|---|
“12456543” | “Checking” | 8500 |
“12456543” | “Savings” | 8900 |
“12283942” | “Checking” | 6410 |
“12283942” | “Savings” | 8020 |
“12839485” | “Checking” | 78000 |
“12839485” | “Savings” | 920000 |
We can use tidyr’s gather()
function to do this transformation. gather()
takes a data frame and the columns to unpack:
df %>% gather('Checking','Savings',key='Account Type',value='Amount')
The arguments you provide are:
df
: the data frame you want to gather, which can be piped intogather()
Checking
andSavings
: the columns of the old data frame that you want to turn into variableskey
: what to call the column of the new data frame that stores the variablesvalue
: what to call the column of the new data frame that stores the values
Instructions
The students
data frame from the previous exercise has been loaded into the notebook for you. Save the columns names to original_col_names
and print it.
There is a column for the scores on the fractions
exam, and a column for the scores on the probability
exam.
We want to make each row an observation, so we want to transform this table to look like:
full_name | exam | score | gender_age | grade |
---|---|---|---|---|
“First Student” | “fractions” | score% | … | … |
“First Student” | “probability” | score% | … | … |
“Second Student” | “fractions” | score% | … | … |
“Second Student” | “probability” | score% | … | … |
… | … | … | … | … |
Use gather
to create a new table (still called students
) that follows this structure. Then view the head()
of students.
Save the columns names of the updated students
data frame to gathered_col_names
and print it.
The dplyr function count()
takes a data frame and a column as arguments and returns a table with counts of the unique values in the named column.
Find the count of each unique value in the exam
column. Save the result to exam_counts
and view exam_counts
.