Learn

In the last exercise, we learned how to simulate a dataset for a Chi-Square test, run the test, and then output a result: ‘significant’ or ‘not significant’. In this exercise, we’ll repeat that process many times so that we can inspect the relative frequency of each outcome.

To do this, we’ll start by creating an empty list to store the results of our repeated experiments. Next, we’ll move all of our simulation code (to create a sample dataset, run a Chi-Square test, and determine a result) inside of a for-loop. In each iteration of the loop, we’ll append the outcome to our results list so that we can inspect it later.

The outline of the code looks something like this:

``````Set the sample size and subscription probabilities
Create an empty list named `results`

Repeat 100 times in a for-loop:
Simulate a dataset
Run a Chi-Square test
Use the p-value to determine significance
Append the result ('significant' or 'not significant') to `results```````

Finally, we can inspect `results` by calculating the proportion of simulated tests where the result was `'significant'`:

``````results =  np.array(results)
print(np.sum(results == 'significant')/100)``````

### Instructions

1.

In script.py, we’ve copied over the code from the previous exercise and moved the simulation inside a for-loop as described in the narrative. We’ve also initialized an empty list named `results`.

Below the determination of `result`, but still inside the for-loop, add a line of code to append `result` onto `results`.

2.

Outside of the for-loop, add a line of code to print the proportion of `results` that are `'significant'`. Press “Run” a few times (note: you’ll see slightly different numbers each time because this is a random process). Approximately what proportion of the results were significant (would have led us to switch to the new, name email subject)?