Suppose that you own a chain of stores that sell ants, called VeryAnts. There are three different locations: A, B, and C. You want to know if the average ant sales over the past year are significantly different between the three locations.
At first, it seems that you could perform T-tests between each pair of stores.
You know that the p-value is the probability that you incorrectly reject the null hypothesis on each t-test. The more t-tests you perform, the more likely that you are to get a false positive, a Type I error.
For a p-value of
0.05, if the null hypothesis is true, then the probability of obtaining a significant result is
1 – 0.05 =
0.95. When you run another t-test, the probability of still getting a correct result is
0.9025. That means your probability of making an error is now close to
10%! This error probability only gets bigger with the more t-tests you do.
We have created samples
store_c, representing the sales at VeryAnts at locations A, B, and C, respectively. We want to see if there’s a significant difference in sales between the three locations.
store_c by finding and viewing the means and standard deviations of each one. Store the means in variables called
store_c_mean. Store the standard deviations in variables called
Perform a Two Sample T-test between each pair of location data.
Store the results of the tests in variables called
b_c_results. View the results for each test.
Store the probability of error for running three T-Tests in a variable called