A tidy dataset follows three fundamental rules:
Below is an example of a tidy dataset:
| ID# | Student | Year | Class | Grade |
|---|---|---|---|---|
| 1 | Brown | 2020 | Chem | F |
| 1 | Brown | 2021 | Chem | B |
| 1 | Brown | 2021 | Math | A |
| 2 | Smith | 2020 | Bio | C |
| 2 | Smith | 2021 | CompSci | B |
| 3 | Saito | 2020 | Chem | A |
| 3 | Saito | 2021 | Math | B |
Messy data is data that violates one of the tidy dataset rules (1. Each variable forms a column; 2. Each observation forms a row; 3. Each type of observational unit forms a table).
Below is an example of messy data:
| ID# | Name | ChemGrade2020 | MathGrade2020 | BioGrade2020 | CHemGrade2021 | MathGrad2021 | BioGrade21 |
|---|---|---|---|---|---|---|---|
| 1 | Brown | F | B | B | C | ||
| B | smith | 100 | 95 | 65 | |||
| 3 | Saito, K | A | 90 | B | 85 |