Now that we’ve practiced simulating data for an A/B test, let’s actually run a Chi-Square test for each simulated dataset and consider the decision we would make based on the outcome.
If we were really running this test, we would want to use the data to make a decision about whether to use the control (old) or name (new) email subject. To make that decision, we can use a significance threshold. For example, if we’re using a significance threshold of 0.05, we’ll “reject the null hypothesis” for any p-value less than 0.05. In this context, rejecting the null would mean that we conclude that there is a significant difference between the open rates for the two email subjects and therefore we should switch to the email subject that uses the recipient’s first name.
We can use the following Python statement to record whether a particular p-value is significant or not, based on a threshold of 0.05:
result = ('significant' if pval < 0.05 else 'not significant') print(result)
The code from the previous exercises is provided for you in script.py. This code generates a simulated dataset named
sim_data and then runs a Chi-Square test for that data, saving the p-value as
An additional variable named
significance_threshold has been defined for you in script.py, which is equal to the significance threshold for the test. After the p-value calculation, add a line of code that uses
significance_threshold to determine whether the p-value is
'not significant'. Save the result as
result and print it out.
Press “Run” a few times until you see both a
'significant' and a
'not significant' result. Note that it is possible to get different results every time you sample a new group of 100 recipients.