Learn

If we want to understand whether the outcomes of two categorical variables are associated, we can use a Chi-Square test. It is useful in situations like:

• An A/B test where half of users were shown a green submit button and the other half were shown a purple submit button. Was one group more likely to click the submit button?
• People under and over age 40 were given a survey asking “Which of the following three products is your favorite?” Did these age groups have significantly different preferences?

In `SciPy`, we can use the function `chi2_contingency()` to perform a Chi-Square test. The input to `chi2_contingency` is a contingency table, which can be created using the `pandas` `crosstab()` function as follows:

``````#create table:
import pandas as pd
table = pd.crosstab(variable_1, variable_2)

#run the test:
from scipy.stats import chi2_contingency
chi2, pval, dof, expected = chi2_contingency(table)``````

For example, suppose we want to know whether gender is associated with the probability of a website visitor making a purchase. The null hypothesis is that there’s no association between the variables (eg. males, females, and non-binary people are all equally likely to make a purchase on the website, so gender and purchase-status are not associated). If the p-value is below our chosen threshold (often 0.05), we reject the null hypothesis and can conclude there is a statistically significant association between the two variables (eg. men, women, and non-binary people appear to have different probabilities of making a purchase, so gender is associated with purchase-status).

### Instructions

1.

The management at the VeryAnts ant store wants to know if their two most popular species of ants, the Leaf Cutter and the Harvester, vary in popularity between 1st, 2nd, and 3rd graders.

We have provided a dataset named `ants` with a sample of 108 sales to 1st, 2nd, and 3rd grade teachers. The dataset has two columns: `Grade` (equal to `'1st'`, `'2nd'`, or `'3rd'`) and `Ant` (equal to `'Leaf Cutter'` or `'Harvester'`).

Use this data to create a contingency table of the `Grade` and `Ant` columns, and save the table as `table`.

2.

Use the `chi2_contingency()` function from SciPy to run a Chi-Square test using the contingency table you just created (saved as `table`). Save the p-value as `pval` and print it out.

3.

Are certain types of ants more popular among specific grades (is there an association between grade and ant type)? Using a significance threshold of 0.05, indicate your answer by changing the value of `significant` to `True` if there is a significant association between these variables and `False` otherwise.