In the previous exercise, we used a two-sample t-test to investigate an association between a quantitative variable (time spent on a website) and a binary categorical variable (an old color scheme or a new color scheme).
In some circumstances, we might instead care about an association between a quantitative variable and a non-binary categorical variable (non-binary means more than two categories).
For example, suppose that we own a chain of stores that sell ants, called VeryAnts. There are three different locations: A, B, and C. We want to know whether customers are spending a significantly different amount per order at any of the locations.
There are three different comparisons we could make: A vs. B, B vs. C, and A vs. C. One way to answer our question is to simply run three separate 2-sample t-tests.
We have created samples
c, representing the amount (in U.S.D) spent on orders at VeryAnts at locations A, B, and C, respectively. We want to see if there’s a significant difference in the average spending per order at the three locations.
Code has been provided for you to generate side by side box plots of the sales at each of these stores. Based on this visualization, are there any stores where customers appear to be spending more or less money?
Perform a 2-Sample T-test between each pair of location data.
Store the p-values in variables called
b_c_pval. Print them to the console.
Note that you may see numbers in scientific notation in the print out of one or more p-values. If you see something like
e-05 at the end of a number, that means that the preceding number is multiplied by 10^(-5). In other words,
2.5134230524e-05 is equal to
Inspect the p-values that you printed out. Using a significance level of 0.05, for which pairs of stores did you find a significant difference between the average sale price? Assign the values of
True if the p-value indicates a significant difference and
False if the p-value does not indicate a significant difference.