Learn
Hypothesis Testing with R
One Sample T-Test

Consider the fictional business BuyPie, which sends ingredients for pies to your household so that you can make them from scratch. Suppose that a product manager hypothesizes the average age of visitors to BuyPie.com is `30`. In the past hour, the website had `100` visitors and the average age was `31`. Are the visitors older than expected? Or is this just the result of chance (sampling error) and a small sample size?

You can test this using a One Sample T-Test. A One Sample T-Test compares a sample mean to a hypothetical population mean. It answers the question “What is the probability that the sample came from a distribution with the desired mean?”

The first step is formulating a null hypothesis, which again is the hypothesis that there is no difference between the populations you are comparing. The second population in a One Sample T-Test is the hypothetical population you choose. The null hypothesis that this test examines can be phrased as follows: `"The set of samples belongs to a population with the target mean".`

One result of a One Sample T-Test will be a p-value, which tells you whether or not you can reject this null hypothesis. If the p-value you receive is less than your significance level, normally `0.05`, you can reject the null hypothesis and state that there is a significant difference.

R has a function called `t.test()` in the `stats` package which can perform a One Sample T-Test for you.

`t.test()` requires two arguments, a distribution of values and an expected mean:

``results <- t.test(sample_distribution, mu = expected_mean)``
• `sample_distribution` is the sample of values that were collected
• `mu` is an argument indicating the desired mean of the hypothetical population
• `expected_mean` is the value of the desired mean

`t.test()` will return, among other information we will not cover here, a p-value — this tells you how confident you can be that the sample of values came from a distribution with the specified mean.

P-values give you an idea of how confident you can be in a result. Just because you don’t have enough data to detect a difference doesn’t mean that there isn’t one. Generally, the more samples you have, the smaller a difference you can detect.

### Instructions

1.

We have provided a small dataset called `ages`, representing the ages of customers to BuyPie.com in the past hour, in `notebook.Rmd`.

Even with a small dataset like this, it is hard to make judgments from just looking at the numbers.

To understand the data better, let’s look at the mean. Calculate the mean of `ages`, and store the result in a variable called `ages_mean`. View `ages_mean`.

2.

Use the `t.test()` function with `ages` to see what p-value the experiment returns for this distribution, where we expect the mean to be `30`.

Store the results of the test in a variable called `results`.

Does the p-value you got with the One Sample T-Test make sense, knowing the mean of `ages`?