Learn

If we want to understand whether the outcomes of two categorical variables are associated, we can use a Chi-Square test. It is useful in situations like:

  • An A/B test where half of users were shown a green submit button and the other half were shown a purple submit button. Was one group more likely to click the submit button?
  • People under and over age 40 were given a survey asking “Which of the following three products is your favorite?” Did these age groups have significantly different preferences?

In SciPy, we can use the function chi2_contingency() to perform a Chi-Square test. The input to chi2_contingency is a contingency table, which can be created using the pandas crosstab() function as follows:

#create table: import pandas as pd table = pd.crosstab(variable_1, variable_2) #run the test: from scipy.stats import chi2_contingency chi2, pval, dof, expected = chi2_contingency(table)

For example, suppose we want to know whether gender is associated with the probability of a website visitor making a purchase. The null hypothesis is that there’s no association between the variables (eg. males, females, and non-binary people are all equally likely to make a purchase on the website, so gender and purchase-status are not associated). If the p-value is below our chosen threshold (often 0.05), we reject the null hypothesis and can conclude there is a statistically significant association between the two variables (eg. men, women, and non-binary people appear to have different probabilities of making a purchase, so gender is associated with purchase-status).

Instructions

1.

The management at the VeryAnts ant store wants to know if their two most popular species of ants, the Leaf Cutter and the Harvester, vary in popularity between 1st, 2nd, and 3rd graders.

We have provided a dataset named ants with a sample of 108 sales to 1st, 2nd, and 3rd grade teachers. The dataset has two columns: Grade (equal to '1st', '2nd', or '3rd') and Ant (equal to 'Leaf Cutter' or 'Harvester').

Use this data to create a contingency table of the Grade and Ant columns, and save the table as table.

2.

Use the chi2_contingency() function from SciPy to run a Chi-Square test using the contingency table you just created (saved as table). Save the p-value as pval and print it out.

3.

Are certain types of ants more popular among specific grades (is there an association between grade and ant type)? Using a significance threshold of 0.05, indicate your answer by changing the value of significant to True if there is a significant association between these variables and False otherwise.

Sign up to start coding

Mini Info Outline Icon
By signing up for Codecademy, you agree to Codecademy's Terms of Service & Privacy Policy.

Or sign up using:

Already have an account?