Suppose that you own a chain of stores that sell ants, called VeryAnts. There are three different locations: A, B, and C. You want to know if the average ant sales over the past year are significantly different between the three locations.

At first, it seems that you could perform T-tests between each pair of stores.

You know that the p-value is the probability that you incorrectly reject the null hypothesis on each t-test. The more t-tests you perform, the more likely that you are to get a false positive, a Type I error.

For a p-value of `0.05`

, if the null hypothesis is true, then the probability of obtaining a significant result is `1 – 0.05`

= `0.95`

. When you run another t-test, the probability of still getting a correct result is `0.95`

* `0.95`

, or `0.9025`

. That means your probability of making an error is now close to `10%`

! This error probability only gets bigger with the more t-tests you do.

### Instructions

**1.**

We have created samples `store_a`

, `store_b`

, and `store_c`

, representing the sales at VeryAnts at locations A, B, and C, respectively. We want to see if there’s a significant difference in sales between the three locations.

Explore datasets `store_a`

, `store_b`

, and `store_c`

by finding and viewing the means and standard deviations of each one. Store the means in variables called `store_a_mean`

, `store_b_mean`

, and `store_c_mean`

. Store the standard deviations in variables called `store_a_sd`

, `store_b_sd`

, and `store_c_sd`

.

**2.**

Perform a Two Sample T-test between each pair of location data.

Store the results of the tests in variables called `a_b_results`

, `a_c_results`

, and `b_c_results`

. View the results for each test.

**3.**

Store the probability of error for running three T-Tests in a variable called `error_prob`

. View `error_prob`

.