In the last exercise, we ran three separate 2-sample t-tests to investigate an association between a quantitative variable (amount spent per sale) and a non-binary categorical variable (location of VeryAnts visited, with options A, B, and C). The problem with this approach is that it inflates our probability of a type I error; the more tests we run, the worse the problem becomes!
In this situation, one approach is to instead use ANOVA (Analysis of Variance). ANOVA tests the null hypothesis that all groups have the same population mean (eg., the true average price of a sale is the same at every location of VeryAnts).
In Python, we can use the SciPy function
f_oneway() to perform an ANOVA.
f_oneway() has two outputs: the F-statistic (not covered in this course) and the p-value. If we were comparing scores on a video-game for math majors, writing majors, and psychology majors, we could run an ANOVA test with this line:
from scipy.stats import f_oneway fstat, pval = f_oneway(scores_mathematicians, scores_writers, scores_psychologists)
If the p-value is below our significance threshold, we can conclude that at least one pair of our groups earned significantly different scores on average; however, we won’t know which pair until we investigate further!
The same data from the previous exercise is available to you in the workspace: costs of sales made at three locations of VeryAnts (saved as
Perform an ANOVA test on
c and store the p-value in a variable called
pval, then print it out.
At a .05 significance level, does this p-value lead you to reject the null hypothesis (and conclude that at least one pair of stores have significantly different average sales)?
Change the value of
True if the p-value indicates at least one pair of stores have significantly different sales and