In this lesson we used a few different methods to assess whether there was an association between two categorical variables. Although we used binary variables (only 2 options per category), it is important to note that the same techniques can be used for non-binary categorical variables. The methods we used in this lesson included:

  • Contingency tables of frequencies
  • Contingency tables of proportions
  • Marginal proportions
  • Expected contingency tables
  • The Chi-Square statistic

Note that the data in this lesson was downloaded from Kaggle, then cleaned and subsetted. The data was originally collected and made public by the Open-Source Psychometrics Project.


As a final exercise, the NPI dataset has been loaded for you once more in script.py as npi. Remember that the columns are defined as follows:

  • influence: yes = I have a natural talent for influencing people; no = I am not good at influencing people.
  • blend_in: yes = I prefer to blend in with the crowd; no = I like to be the center of attention.
  • special: yes = I think I am a special person; no = I am no better or worse than most people.
  • leader: yes = I see myself as a good leader; no = I am not sure if I would make a good leader.
  • authority: yes = I like to have authority over other people; no = I don’t mind following orders.

Which other pairs of questions might be associated (or not)? Use the workspace and your newfound skills to investigate for yourself!

Sign up to start coding

Mini Info Outline Icon
By signing up for Codecademy, you agree to Codecademy's Terms of Service & Privacy Policy.

Or sign up using:

Already have an account?