In the previous exercise we calculated the following marginal proportions for the leader and influence questions:

leader            influence
no     0.484      no     0.388
yes    0.516      yes    0.612

In order to understand whether these questions are associated, we can use the marginal proportions to create a contingency table of expected proportions if there were no association between these variables. To calculate these expected proportions, we need to multiply the marginal proportions for each combination of categories:

leader = no leader = yes
influence = no 0.484*0.388 = 0.188 0.516*0.388 = .200
influence = yes 0.484*0.612 = 0.296 0.516*0.612 = 0.315

These proportions can then be converted to frequencies by multiplying each one by the sample size (11097 for this data):

leader = no leader = yes
influence = no 0.188*11097 = 2087 0.200*11097 = 2221
influence = yes 0.296*11097 = 3288 0.315*11097 = 3501

This table tells us that if there were no association between the leader and influence questions, we would expect 2087 people to answer no to both.

In python, we can calculate this table using the chi2_contingency() function from SciPy, by passing in the observed frequency table. There are actually four outputs from this function, but for now, we’ll only look at the fourth one:

from scipy.stats import chi2_contingency chi2, pval, dof, expected = chi2_contingency(influence_leader_freq) print(np.round(expected))


[[2087. 2221.]
 [3288. 3501.]]

Note that the ScyPy function returned the same expected frequencies as we calculated “by hand” above! Now that we have the expected contingency table if there’s no association, we can compare it to our observed contingency table:

leader       no   yes
no         3015  1293
yes        2360  4429

The more that the expected and observed tables differ, the more sure we can be that the variables are associated. In this example, we see some pretty big differences (eg., 3015 in the observed table compared to 2087 in the expected table). This provides additional evidence that these variables are associated.



The contingency table of frequencies for the special and authority questions is saved for you in script.py as special_authority_freq.

Use the chi2_contingency() function to calculate the expected frequency table for these two questions if there were no association. Save the result as expected.


Use np.round() to print out the expected contingency table, with values rounded to the nearest whole number. Compare this to the observed frequency table. How much do the numbers in these tables differ?

Sign up to start coding

Mini Info Outline Icon
By signing up for Codecademy, you agree to Codecademy's Terms of Service & Privacy Policy.

Or sign up using:

Already have an account?