In this lesson we used a few different methods to assess whether there was an association between two categorical variables. Although we used binary variables (only 2 options per category), it is important to note that the same techniques can be used for non-binary categorical variables. The methods we used in this lesson included:
- Contingency tables of frequencies
- Contingency tables of proportions
- Marginal proportions
- Expected contingency tables
- The Chi-Square statistic
Note that the data in this lesson was downloaded from Kaggle, then cleaned and subsetted. The data was originally collected and made public by the Open-Source Psychometrics Project.
Instructions
As a final exercise, the NPI dataset has been loaded for you once more in script.py as npi
. Remember that the columns are defined as follows:
influence
:yes
= I have a natural talent for influencing people;no
= I am not good at influencing people.blend_in
:yes
= I prefer to blend in with the crowd;no
= I like to be the center of attention.special
:yes
= I think I am a special person;no
= I am no better or worse than most people.leader
:yes
= I see myself as a good leader;no
= I am not sure if I would make a good leader.authority
:yes
= I like to have authority over other people;no
= I don’t mind following orders.
Which other pairs of questions might be associated (or not)? Use the workspace and your newfound skills to investigate for yourself!