Learn

In the previous exercise we calculated the following marginal proportions for the `leader` and `influence` questions:

``````leader            influence
no     0.484      no     0.388
yes    0.516      yes    0.612``````

In order to understand whether these questions are associated, we can use the marginal proportions to create a contingency table of expected proportions if there were no association between these variables. To calculate these expected proportions, we need to multiply the marginal proportions for each combination of categories:

influence = no 0.484*0.388 = 0.188 0.516*0.388 = .200
influence = yes 0.484*0.612 = 0.296 0.516*0.612 = 0.315

These proportions can then be converted to frequencies by multiplying each one by the sample size (11097 for this data):

influence = no 0.188*11097 = 2087 0.200*11097 = 2221
influence = yes 0.296*11097 = 3288 0.315*11097 = 3501

This table tells us that if there were no association between the `leader` and `influence` questions, we would expect 2087 people to answer `no` to both.

In python, we can calculate this table using the `chi2_contingency()` function from SciPy, by passing in the observed frequency table. There are actually four outputs from this function, but for now, we’ll only look at the fourth one:

``````from scipy.stats import chi2_contingency
chi2, pval, dof, expected = chi2_contingency(influence_leader_freq)
print(np.round(expected))``````

Output:

``````[[2087. 2221.]
[3288. 3501.]]``````

Note that the ScyPy function returned the same expected frequencies as we calculated “by hand” above! Now that we have the expected contingency table if there’s no association, we can compare it to our observed contingency table:

``````leader       no   yes
influence
no         3015  1293
yes        2360  4429``````

The more that the expected and observed tables differ, the more sure we can be that the variables are associated. In this example, we see some pretty big differences (eg., 3015 in the observed table compared to 2087 in the expected table). This provides additional evidence that these variables are associated.

### Instructions

1.

The contingency table of frequencies for the `special` and `authority` questions is saved for you in script.py as `special_authority_freq`.

Use the `chi2_contingency()` function to calculate the expected frequency table for these two questions if there were no association. Save the result as `expected`.

2.

Use `np.round()` to print out the `expected` contingency table, with values rounded to the nearest whole number. Compare this to the observed frequency table. How much do the numbers in these tables differ?