Learn

Suppose that you own a chain of stores that sell ants, called VeryAnts. There are three different locations: A, B, and C. You want to know if the average ant sales over the past year are significantly different between the three locations.

At first, it seems that you could perform T-tests between each pair of stores.

You know that the p-value is the probability that you incorrectly reject the null hypothesis on each t-test. The more t-tests you perform, the more likely that you are to get a false positive, a Type I error.

For a p-value of `0.05`, if the null hypothesis is true, then the probability of obtaining a significant result is `1 – 0.05` = `0.95`. When you run another t-test, the probability of still getting a correct result is `0.95` * `0.95`, or `0.9025`. That means your probability of making an error is now close to `10%`! This error probability only gets bigger with the more t-tests you do.

### Instructions

1.

We have created samples `store_a`, `store_b`, and `store_c`, representing the sales at VeryAnts at locations A, B, and C, respectively. We want to see if there’s a significant difference in sales between the three locations.

Explore datasets `store_a`, `store_b`, and `store_c` by finding and viewing the means and standard deviations of each one. Store the means in variables called `store_a_mean`, `store_b_mean`, and `store_c_mean`. Store the standard deviations in variables called `store_a_sd`, `store_b_sd`, and `store_c_sd`.

2.

Perform a Two Sample T-test between each pair of location data.

Store the results of the tests in variables called `a_b_results`, `a_c_results`, and `b_c_results`. View the results for each test.

3.

Store the probability of error for running three T-Tests in a variable called `error_prob`. View `error_prob`.