In the previous exercise, we simulated 1,000 datasets and ran a Chi-Square test for each one, recording whether the results were ‘significant’ or ‘not significant’. This allowed us to estimate the proportion of simulated datasets that led to a ‘significant’ result.

In general, we hope that the test reflects reality. We therefore want the result to be ‘significant’ if there really **is** a significant difference in the probability of an open for the two email subjects (lift > 0). In that case, the proportion of significant results is the true positive rate, also called the *power* of the test. Most sample size calculators aim for a power of 80%.

On the other hand, if there’s no difference in the probability of an email being opened for the two email subjects (lift = 0), a ‘significant’ result would be a false-positive (also called a type I error). This would lead us to invest time and resources into adding first names into email subjects when there’s no real pay-off in the long run.

### Instructions

**1.**

The simulation code from the previous exercises is loaded for you in **script.py**. We’ve included the code to print out the proportion of tests where a significant result was recorded. Currently, the simulation is set up so that there **is** a difference in the probability of a subscription for the two buttons.

Press “Run” a few times and inspect the proportion of significant tests (printed to the output terminal) each time. If we ran a test with the provided sample size (100), baseline conversion rate (50%) and lift (30%), approximately what percent of the time would we correctly observe a significant result? Note that this is the “power” of the test.

**2.**

Now, change the value of `lift`

so that the proportion of significant tests is equal to the **false positive rate** and press “Run” once more.

Note that the proportion of significant tests should be approximately equal to the significance threshold if you’ve done this correctly.