We can graph histograms using a Python module known as Matplotlib. We’re not going to go into detail about Matplotlib’s plotting functions, but if you’re interested in learning more, take our course Learn Data Visualization with Python.
For now, familiarize yourself with the following syntax to draw a histogram:
# This imports the plotting package. We only need to do this once. from matplotlib import pyplot as plt # This plots a histogram plt.hist(data) # This displays the histogram plt.show()
When we enter plt.hist
with no keyword arguments, matplotlib will automatically make a histogram with 10 bins of equal width that span the entire range of our data.
If you want a different number of bins, use the keyword bins
. For instance, the following code would give us 5 bins, instead of 10:
plt.hist(data, bins=5)
If you want a different range, you can pass in the minimum and maximum values that you want to histogram using the keyword range
. We pass in a tuple of two numbers. The first number is the minimum value that we want to plot and the second value is the number that we want to plot up to, but not including.
For instance, if our dataset contained values between 0 and 100, but we only wanted to histogram numbers between 20 and 50, we could use this command:
# We pass 51 so that our range includes 50 plt.hist(data, range=(20, 51))
Here’s a complete example:
from matplotlib import pyplot as plt d = np.array([1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 4, 4, 4, 4, 5]) plt.hist(d, bins=5, range=(1, 6)) plt.show()
Instructions
At the top of script.py, we’ve included a set of data that contains information about commute times (in minutes, rounded to the nearest integer) on the New York subway system.
Start by importing matplotlib.
Plot a histogram of the data using the default number of bins and range (i.e., no keyword arguments).
Display your histogram using plt.show()
.
Now, let’s say we wanted to zoom in and see how many people have commutes ranging between 20 and 50 minutes.
Change your histogram command so that you get a histogram with a range of 20 to 50.
Modify your histogram one more time so that it has 6 bins.