It turns out that, when we run a hypothesis test with a significance threshold, the significance threshold is equal to the type I error (false positive) rate for the test. To see this, we can use a simulation.
Recall our quiz question example: the null hypothesis is that the probability of getting a quiz question correct is equal to 70%. We’ll make a type I error if the null hypothesis is correct (the true probability of a correct answer is 70%), but we get a significant p-value anyways.
Now, consider the following simulation code:
false_positives = 0 sig_threshold = 0.05 for i in range(1000): sim_sample = np.random.choice(['correct', 'incorrect'], size=100, p=[0.7, 0.3]) num_correct = np.sum(sim_sample == 'correct') p_val = binom_test(num_correct, 100, 0.7) if p_val < sig_threshold: false_positives += 1 print(false_positives/1000) #Output: 0.0512
This code does the following:
- Set the significance threshold equal to 0.05 and a counter for false positives equal to zero.
- Repeat these steps 1000 times:
- Simulate 100 learners, where each learner has a 70% chance of answering a quiz question correctly.
- Calculate the number of simulated learners who answered the question correctly. Note that, because each learner has a 70% chance of answering correctly, this number will likely be around 70, but will vary every time by random chance.
- Run a binomial test for the simulated sample where the null hypothesis is that the probability of a correct answer is 70% (0.7). Note that, every time we run this test, the null hypothesis is true because we simulated our data so that the probability of a correct answer is 70%.
1to our false positives counter every time we make a type I error (the p-value is significant).
- Print the proportion of our 1000 tests (on simulated samples) that resulted in a false positive.
Note that the proportion of false positive tests is very similar to the value of the significance threshold (0.05).
The code from the narrative has been provided for you in script.py with one small change: the code to create
sim_sample has been altered so that the simulated learners each have an 80% chance of answering the question correctly. Change the call to
binom_test() so that, for each simulated dataset, you’re running a binomial test where the null hypothesis is true. Press “Run”.
If you’ve done this correctly, you should see that the proportion of tests resulting in a false positive is close to the significance threshold (0.05).
Now, change the significance threshold to
0.01 and press “Run”.
Note that the proportion of simulations that result in a type I error should now be close to 0.01.