Learn

In the last exercise, we learned how to simulate a dataset for a Chi-Square test, run the test, and then output a result: ‘significant’ or ‘not significant’. In this exercise, we’ll repeat that process many times so that we can inspect the relative frequency of each outcome.

To do this, we’ll start by creating an empty list to store the results of our repeated experiments. Next, we’ll move all of our simulation code (to create a sample dataset, run a Chi-Square test, and determine a result) inside of a for-loop. In each iteration of the loop, we’ll append the outcome to our results list so that we can inspect it later.

The outline of the code looks something like this:

Set the sample size and subscription probabilities
Create an empty list named `results`

Repeat 100 times in a for-loop:
   Simulate a dataset
   Run a Chi-Square test
   Use the p-value to determine significance
   Append the result ('significant' or 'not significant') to `results`

Finally, we can inspect results by calculating the proportion of simulated tests where the result was 'significant':

results = np.array(results) print(np.sum(results == 'significant')/100)

Instructions

1.

In script.py, we’ve copied over the code from the previous exercise and moved the simulation inside a for-loop as described in the narrative. We’ve also initialized an empty list named results.

Below the determination of result, but still inside the for-loop, add a line of code to append result onto results.

2.

Outside of the for-loop, add a line of code to print the proportion of results that are 'significant'. Press “Run” a few times (note: you’ll see slightly different numbers each time because this is a random process). Approximately what proportion of the results were significant (would have led us to switch to the new, name email subject)?

Sign up to start coding

By signing up for Codecademy, you agree to Codecademy's Terms of Service & Privacy Policy.
Already have an account?