In this lesson we used a few different methods to assess whether there was an association between two categorical variables. Although we used binary variables (only 2 options per category), it is important to note that the same techniques can be used for non-binary categorical variables. The methods we used in this lesson included:
- Contingency tables of frequencies
- Contingency tables of proportions
- Marginal proportions
- Expected contingency tables
- The Chi-Square statistic
Note that the data in this lesson was downloaded from Kaggle, then cleaned and subsetted. The data was originally collected and made public by the Open-Source Psychometrics Project.
As a final exercise, the NPI dataset has been loaded for you once more in script.py as
npi. Remember that the columns are defined as follows:
yes= I have a natural talent for influencing people;
no= I am not good at influencing people.
yes= I prefer to blend in with the crowd;
no= I like to be the center of attention.
yes= I think I am a special person;
no= I am no better or worse than most people.
yes= I see myself as a good leader;
no= I am not sure if I would make a good leader.
yes= I like to have authority over other people;
no= I don’t mind following orders.
Which other pairs of questions might be associated (or not)? Use the workspace and your newfound skills to investigate for yourself!