Learn

Let’s say that we have performed an ANOVA to compare sales at the three VeryAnts stores. We calculated a p-value less than 0.05 and concluded that there is a significant difference between at least one pair of stores.

Now, we want to find out which pair of stores are different. This is where Tukey’s range test comes in handy!

In Python, we can perform Tukey’s range test using the `statsmodels` function `pairwise_tukeyhsd()`. For example, suppose we are again comparing video-game scores for math majors, writing majors, and psychology majors. We have a dataset named `data` with two columns: `score` and `major`. We could run Tukey’s range test with a type I error rate of 0.05 as follows:

``````from statsmodels.stats.multicomp import pairwise_tukeyhsd
tukey_results = pairwise_tukeyhsd(data.score, data.major, 0.05)
print(tukey_results)``````

Output:

``````Multiple Comparison of Means - Tukey HSD,FWER=0.05
==========================================
group1 group2 meandiff lower upper reject
------------------------------------------
math  psych    3.32 -0.11  6.74  False
math  write    5.23  2.03  8.43  True
psych  write   -2.12 -5.25  1.01  False
------------------------------------------``````

Tukey’s range test is similar to running three separate 2-sample t-tests, except that it runs all of these tests simultaneously in order to preserve the type I error rate.

The function output is a table, with one row per pair-wise comparison. For every comparison where `reject` is `True`, we “reject the null hypothesis” and conclude there is a significant difference between those two groups. For example, in the output above, we would conclude that there is a significant difference between scores for math and writing majors, but no significant difference in scores for the other comparisons.

### Instructions

1.

The `veryants` dataset is provided for you once again in `script.py`. The `Store` column represents the store that a sale was made at (`'A'`, `'B'`, or `'C'`) and the `Sale` column represents the cost of a sale in U.S.D.

Run Tukey’s range test with a type I error rate of `0.05` to determine whether average sales are different at any pair of two stores and save the result as `tukey_results`, then print it out.

2.

Inspect the output from the test you just ran. For which pairs of stores did you find a significant difference in average sales?

Assign the values of `a_b_significant`, `a_c_significant` `b_c_significant` to `True` if the test indicates a significant difference in sales at the indicated pair of stores and `False` if not.

Recall that when we ran three t-tests, we found significant differences for the A vs. B and A vs. C comparisons. Do we get the same result with this test?