In the previous exercise, we simulated 10000 different samples of 500 visitors, where each visitor had a 10% chance of making a purchase, and calculated the number of purchases per sample. Upon further inspection, we saw that those numbers ranged from around 25 to 75. This is useful information, but we can learn even more from inspecting the full distribution.
For example, recall our 10000 coin flip experiments: for each experiment, we flipped a fair coin 10 times and recorded the number of heads in a list named
outcomes. We can plot a histogram of
matplotlib.pyplot.hist(). We can also add a vertical line at any x-value using
import matplotlib.pyplot as plt plt.hist(outcomes) plt.axvline(2, color = 'r') plt.show()
This histogram shows us that, over 10000 experiments, we observed as few as 0 and as many as 10 heads out of 10 flips. However, we were most likely to observe around 4-6 heads. It would be unlikely to observe only 2 heads (where the vertical red line is).
The code from the previous exercise is provided for you in script.py. The list
null_outcomes contains numbers of purchases simulated under the null hypothesis.
Add code to plot a histogram of
null_outcomes and inspect the plot. What range of values occurs most frequently?
Note that, because we are using simulation, if you press “Run” a few times, the histogram will change slightly each time — but the basic shape and range covered on the x-axis will stay the same.
In the month we’re investigating, we calculated that there were 41 purchases. Add a vertical line to your histogram at 41. Make this line red using
color = 'r' so that you can see it.
Where does 41 fall in this distribution? Is it relatively likely or unlikely?