At this point, let’s return to the point of view of a product manager who is actually planning this A/B test. Suppose that the product manager wants to be able to accurately detect a lift of 30% (or higher), but also wants to avoid false positives (they don’t want to change the email subjects unless there’s actually a difference between them). To plan their test, the product manager needs to consider the following:
- Increasing the sample size increases the power of the test (the probability of detecting a difference if there is one); however, larger sample sizes require more time and resources.
- Increasing the significance threshold also increases the power of the test; however, it simultaneously increases the false positive rate (the probability of detecting a difference when there isn’t one).
Finally, if the project manager chooses a larger minimum detectable effect/lift, then they’ll be able to decrease the sample size without decreasing power. However, if they set up their test to detect a minimum lift of 30% (for example), they may not be able to detect smaller differences that are still meaningful.
The simulation code from the previous exercises is provided for you in script.py. Currently, the simulation is set up to use an open rate of 50% for the control email, and a lift of 30% for the name email subject. Set the sample size of 100 and press “Run” and make note of the proportion of significant results (which is the power of the test).
Now increase the sample size to
500 and press “Run” again. Note that the power of the test also increases.
Next, increase the significance threshold to
0.10. Note that the power of the test increases even more.
Finally, increase the lift to 40%. Note that again, the power of the test increases.