Learn

It turns out that, when we run a hypothesis test with a significance threshold, the significance threshold is equal to the type I error (false positive) rate for the test. To see this, we can use a simulation.

Recall our quiz question example: the null hypothesis is that the probability of getting a quiz question correct is equal to 70%. We’ll make a type I error if the null hypothesis is correct (the true probability of a correct answer is 70%), but we get a significant p-value anyways.

Now, consider the following simulation code:

``````false_positives = 0
sig_threshold = 0.05

for i in range(1000):
sim_sample = np.random.choice(['correct', 'incorrect'], size=100, p=[0.7, 0.3])
num_correct = np.sum(sim_sample == 'correct')
p_val = binom_test(num_correct, 100, 0.7)
if p_val < sig_threshold:
false_positives += 1

print(false_positives/1000) #Output: 0.0512``````

This code does the following:

• Set the significance threshold equal to 0.05 and a counter for false positives equal to zero.
• Repeat these steps 1000 times:
• Simulate 100 learners, where each learner has a 70% chance of answering a quiz question correctly.
• Calculate the number of simulated learners who answered the question correctly. Note that, because each learner has a 70% chance of answering correctly, this number will likely be around 70, but will vary every time by random chance.
• Run a binomial test for the simulated sample where the null hypothesis is that the probability of a correct answer is 70% (0.7). Note that, every time we run this test, the null hypothesis is true because we simulated our data so that the probability of a correct answer is 70%.
• Add `1` to our false positives counter every time we make a type I error (the p-value is significant).
• Print the proportion of our 1000 tests (on simulated samples) that resulted in a false positive.

Note that the proportion of false positive tests is very similar to the value of the significance threshold (0.05).

### Instructions

1.

The code from the narrative has been provided for you in script.py with one small change: the code to create `sim_sample` has been altered so that the simulated learners each have an 80% chance of answering the question correctly. Change the call to `binom_test()` so that, for each simulated dataset, you’re running a binomial test where the null hypothesis is true. Press “Run”.

If you’ve done this correctly, you should see that the proportion of tests resulting in a false positive is close to the significance threshold (0.05).

2.

Now, change the significance threshold to `0.01` and press “Run”.

Note that the proportion of simulations that result in a type I error should now be close to 0.01.