When we run an A/B test, we usually want to use the results of the test to make a decision: use version A or B? In order to make that decision, many data scientists use a pre-determined significance threshold for their hypothesis test. For example, if we set a significance threshold of 0.05 (a commonly chosen value), we’ll “reject the null hypothesis” and conclude that the conversion rate for version B is significantly different from version A if we get a p-value less than 0.05.
It turns out that this significance threshold is the false positive rate for the test: the probability of finding a significant difference when there really is none. As a business owner, we don’t want to make this kind of mistake, because then we might invest money in a change that doesn’t actually make a difference!
Unfortunately, there’s a trade-off between false positives and false negatives. A false negative occurs when there is a difference between version A and B, but the test doesn’t detect it. This is a potential missed opportunity for a business owner!
Most A/B test sample size calculators estimate the sample size needed for a 20% false negative rate; while a data scientist needs to choose the false positive rate they are comfortable with. The lower the false positive rate, the larger the sample size will need to be!
Try changing the significance threshold for the calculator in the workspace. Note how the sample size changes. Do you see how a lower threshold (lower false positive rate) requires a larger sample size?