The formal definition for the median of a dataset is:
The value that, assuming the dataset is ordered from smallest to largest, falls in the middle. If there are an even number of values in a dataset, you either report both of the middle two values or their average.
There are always two steps to finding the median of a dataset:
- Order the values in the dataset from smallest to largest
- Identify the number(s) that fall(s) in the middle
Example One: Even Number of Values
Say we have a dataset with the following ten numbers:
The first step is to order these numbers from smallest to largest:
Because this dataset has an even number of values, there are two medians:
16 has four datapoints to the left, and
24 has four datapoints to the right.
Although you can report both values as the median, people often average them. If you averaged
24, you could report the median as
Example Two: Odd Number of Values
If we added another value (say,
24) to the dataset and sorted it, we would have:
The new median is equal to
24, because there are 5 values to the left of it, and 5 values to the right of it.
In the next two steps, you will manually sort an array, and then determine which value in the array is the median.
In notebook.Rmd, we have a vector with the ages of the first five authors from Le Monde’s survey:
five_author_ages there is a variable called
sorted_author_ages. Change the
sorted_author_ages to the values in ascending order from
median_value equal to the median of the array.