Consider the fictional business BuyPie, which sends ingredients for pies to your household so that you can make them from scratch. Suppose that a product manager hypothesizes the average age of visitors to BuyPie.com is
30. In the past hour, the website had
100 visitors and the average age was
31. Are the visitors older than expected? Or is this just the result of chance (sampling error) and a small sample size?
You can test this using a One Sample T-Test. A One Sample T-Test compares a sample mean to a hypothetical population mean. It answers the question “What is the probability that the sample came from a distribution with the desired mean?”
The first step is formulating a null hypothesis, which again is the hypothesis that there is no difference between the populations you are comparing. The second population in a One Sample T-Test is the hypothetical population you choose. The null hypothesis that this test examines can be phrased as follows:
"The set of samples belongs to a population with the target mean".
One result of a One Sample T-Test will be a p-value, which tells you whether or not you can reject this null hypothesis. If the p-value you receive is less than your significance level, normally
0.05, you can reject the null hypothesis and state that there is a significant difference.
R has a function called
t.test() in the
stats package which can perform a One Sample T-Test for you.
t.test() requires two arguments, a distribution of values and an expected mean:
results <- t.test(sample_distribution, mu = expected_mean)
sample_distributionis the sample of values that were collected
muis an argument indicating the desired mean of the hypothetical population
expected_meanis the value of the desired mean
t.test() will return, among other information we will not cover here, a p-value — this tells you how confident you can be that the sample of values came from a distribution with the specified mean.
P-values give you an idea of how confident you can be in a result. Just because you don’t have enough data to detect a difference doesn’t mean that there isn’t one. Generally, the more samples you have, the smaller a difference you can detect.
We have provided a small dataset called
ages, representing the ages of customers to BuyPie.com in the past hour, in
Even with a small dataset like this, it is hard to make judgments from just looking at the numbers.
To understand the data better, let’s look at the mean. Calculate the mean of
ages, and store the result in a variable called
t.test() function with
ages to see what p-value the experiment returns for this distribution, where we expect the mean to be
Store the results of the test in a variable called
Does the p-value you got with the One Sample T-Test make sense, knowing the mean of