A common way to communicate a high-level overview of a dataset is to find the values that split the data into four groups of equal size.
By doing this, we can then say whether a new datapoint falls in the first, second, third, or fourth quarter of the data.
The values that split the data into fourths are the quartiles.
Those values are called the first quartile (Q1), the second quartile (Q2), and the third quartile (Q3)
In the image above, Q1 is
10, Q2 is
13, and Q3 is
22. Those three values split the data into four groups that each contain five datapoints.
In this lesson, you will learn to calculate the quartiles by hand, and by using Python’s NumPy library.
In this lesson we’ll be looking at a dataset about music. We’ve plotted a histogram of song lengths (measured in seconds) of 9,975 random songs.
Look up the length of a favorite song of yours. Do you think that song falls in the first, second, third or fourth quarter of the data?
For example, we’ve picked one of our favorite songs, Chicago by Sufjan Stevens. Chicago is
364 seconds long — we’ve plotted it as a red vertical line. It looks like Chicago is in either the third or fourth quarter of the data, but it’s hard to say for sure. Let’s find the quartiles of the dataset!