A tidy dataset follows three fundamental rules:
Below is an example of a tidy dataset:
ID# | Student | Year | Class | Grade |
---|---|---|---|---|
1 | Brown | 2020 | Chem | F |
1 | Brown | 2021 | Chem | B |
1 | Brown | 2021 | Math | A |
2 | Smith | 2020 | Bio | C |
2 | Smith | 2021 | CompSci | B |
3 | Saito | 2020 | Chem | A |
3 | Saito | 2021 | Math | B |
Messy data is data that violates one of the tidy dataset rules (1. Each variable forms a column; 2. Each observation forms a row; 3. Each type of observational unit forms a table).
Below is an example of messy data:
ID# | Name | ChemGrade2020 | MathGrade2020 | BioGrade2020 | CHemGrade2021 | MathGrad2021 | BioGrade21 |
---|---|---|---|---|---|---|---|
1 | Brown | F | B | B | C | ||
B | smith | 100 | 95 | 65 | |||
3 | Saito, K | A | 90 | B | 85 |