It turns out that, when we run a hypothesis test with a significance threshold, the significance threshold is equal to the type I error (false positive) rate for the test. To see this, we can use a simulation.

Recall our quiz question example: the null hypothesis is that the probability of getting a quiz question correct is equal to 70%. We’ll make a type I error if the null hypothesis is correct (the true probability of a correct answer is 70%), but we get a significant p-value anyways.

Now, consider the following simulation code:

false_positives = 0 sig_threshold = 0.05 for i in range(1000): sim_sample = np.random.choice(['correct', 'incorrect'], size=100, p=[0.7, 0.3]) num_correct = np.sum(sim_sample == 'correct') p_val = binom_test(num_correct, 100, 0.7) if p_val < sig_threshold: false_positives += 1 print(false_positives/1000) #Output: 0.0512

This code does the following:

- Set the significance threshold equal to 0.05 and a counter for false positives equal to zero.
- Repeat these steps 1000 times:
- Simulate 100 learners, where each learner has a 70% chance of answering a quiz question correctly.
- Calculate the number of simulated learners who answered the question correctly. Note that, because each learner has a 70% chance of answering correctly, this number will likely be around 70, but will vary every time by random chance.
- Run a binomial test for the simulated sample where the null hypothesis is that the probability of a correct answer is 70% (0.7). Note that, every time we run this test, the null hypothesis is true because we simulated our data so that the probability of a correct answer is 70%.
- Add
`1`

to our false positives counter every time we make a type I error (the p-value is significant).

- Print the proportion of our 1000 tests (on simulated samples) that resulted in a false positive.

Note that the proportion of false positive tests is very similar to the value of the significance threshold (0.05).

### Instructions

**1.**

The code from the narrative has been provided for you in **script.py** with one small change: the code to create `sim_sample`

has been altered so that the simulated learners each have an 80% chance of answering the question correctly. Change the call to `binom_test()`

so that, for each simulated dataset, you’re running a binomial test where the **null hypothesis is true**. Press “Run”.

If you’ve done this correctly, you should see that the proportion of tests resulting in a false positive is close to the significance threshold (0.05).

**2.**

Now, change the significance threshold to `0.01`

and press “Run”.

Note that the proportion of simulations that result in a type I error should now be close to 0.01.