In this lesson, you learned how to find the median of a dataset in two steps:
- Sort the dataset
- Identify the one or two numbers that fall in the middle of the sorted dataset
You also learned how to calculate the median using NumPy:
np.median(my_array)
Discussion
Take a look at the histogram to the right. It displays the author age distribution with vertical lines for the mean (red) and median (yellow).
Do you feel like the median of our dataset, 40.5, provides us enough information to claim when authors publish their greatest work?
We argue it does not.
Although the median is a good measure of the dataset’s center, we cannot make a definitive claim about when authors publish their greatest work — the youngest author published at 18 and the oldest at 76. It would be irresponsible to say anything but, “it seems to be possible at almost any age.”
Notice that the mean and the median are nearly equal. This is not a surprising result, as both statistics are a measure of the dataset’s center. However, it’s worth noting that these results will not always be so close.
In the instructions below, we’ve written a brief explanation that puts median in the context of our problem.
Instructions
The median age of authors, when they publish their best work, from Le Monde’s 100 greatest books is 41.
While this does not tell us much about which year is an author’s greatest year, it does indicate that half of the authors from the survey find their greatest success before the age of 41 and half find their greatest success after the age of 41.