We’ve spent this lesson building a boxplot by hand. Let’s now look at how Python’s Matplotlib library does it!
matplotlib.pyplot module has a function named
boxplot() takes a dataset as a parameter. This dataset could be something like a list of numbers, or a Pandas DataFrame.
import matplotlib.pyplot as plt data = [1, 2, 3, 4, 5] plt.boxplot(data) plt.show()
One of the strengths of Matplotlib is the ease of plotting two boxplots side by side. If you pass
boxplot() a list of datasets, Matplotlib will make a boxplot for each, allowing you to compare their spread and central tendencies,
import matplotlib.pyplot as plt dataset_one = [1, 2, 3, 4, 5] dataset_two = [3, 4, 5, 6, 7] plt.boxplot([dataset_one, dataset_two]) plt.show()
We’ve imported the dataset of song lengths, but this time, we’ve split the data into three groups — songs that were released in the year 2000 (
two_thousand), songs that were released in the year 2001 (
two_thousand_one), and songs that were released in the year 2002 (
Plot all three datasets as three separate boxplots in the order described above.
Make sure to call
plt.show() after calling the
Let’s add labels to our graph so we know which box plot is which.
Add the parameter
labels = ["2000 Songs", "2001 Songs", "2002 Songs"] to your call to the
Let’s think about what the boxplot is showing us. What can you say about this data that would be hard to know without a boxplot?
Look at the hint to see our thoughts. Hit the “Run” button when you’re ready to move on.