Differences Between Z-Test and T-Test
In statistical analysis, we use hypothesis testing to test assumptions or claims using a sample parameter. For example, let’s say that the average reading time for an article at Codecademy is not more than 10 minutes. We can test this statement using z-test or t-test, which are two of the most common hypothesis testing methods. However, deciding whether to use z-test or t-test for hypothesis testing can be difficult, especially if you are a beginner. In this article, you will learn the basics of z-test vs t-test, their differences, and their similarities. We will also discuss use cases for which you can use z-test and t-test.
What is z-test?
The z-test is a hypothesis-testing method for determining whether there is a significant difference between the mean of a given attribute of a sample and the population. It is used for known population standard deviation and a sample size greater than 30 (n ≥30).
There are two main types of z-tests that you can use in different scenarios.
One sample z-test
We use the one-sample z-test to compare the sample mean with the population mean when we know the population standard deviation. To use the one-sample z-test method for hypothesis testing, we calculate the z-test statistic using the following formula:
Z = (μ - μ₀) / (σ / √N)
Here,
- Z denotes the z-test statistic.
- μ is the mean of an attribute in a given sample data.
- μ₀ is the hypothesized mean of the population we are testing for.
- σ is the standard deviation for the population.
- N is the number of data points in the sample.
Two sample z-test
We use the two-sample z-test to compare the mean for two independent samples when we know the population standard deviation. To use the two-sample z-test method for hypothesis testing, we calculate the z-test statistic using the following formulae:
Z = (μ₁ - μ₂) / √((σ₁² / N₁) + (σ₂² / N₂))
Here,
- Z denotes the z-test statistic.
- μ₁ and μ₂ are the means of an attribute for the given samples.
- σ₁ and σ₂ are the standard deviations of the populations.
- N₁ and N₂ are the sample sizes.
After calculating the z-test statistic, we use a critical z-value from z-table to check if the difference in mean of the two populations is statistically significant or not. To understand how the z-test works, let’s discuss an example of the one-sample z-test.
One sample z-test example
Suppose we are told that the average reading time for an article at Codecademy is not more than 10 minutes. To check this assumption, we gather the mean reading time of a random sample of 50 articles, which is 10.5 minutes. Based on past data, we have determined that the population standard deviation is 3.1 minutes. For our sample, the mean reading time is greater than 10 minutes, which is different from the assumption. Now, to check if the observed difference is statistically significant and if this didn’t occur just by chance, we will use one sample z-test.
For this, we will first establish the null and alternate hypotheses. The null hypothesis states that the mean is less than equal to 10, and the alternate hypothesis states that the mean is greater than 10.
H₀: μ<=10Hₐ: μ>10
- Now, we will determine if the test is one-tailed or two-tailed. We want to check if the mean reading time is less than or equal to 10, so the rejection region will lie only on the right side of the distribution curve. Hence, we will use a one-tailed test.
- After deciding on the test, we will specify the acceptable significance level. For this, we will choose significance level α=0.05.
- Using the significance level α, we will specify the decision rule. For a one-tailed test, the right tail of the distribution curve will have a 5 percent rejection region. Hence, we will calculate the critical z-value for the cumulative probability of 0.95, which is 1.65.
After obtaining the critical z-value, we will state the rules for rejecting hypotheses.
If Z > 1.65, reject H₀ (The calculated mean will be in the right-hand side rejection region.)If Z < 1.65, we fail to reject H₀ (The calculated mean will be in the non-rejection region.)
Next, we will calculate the z-test statistic for our sample using the given data and the formula for one sample z-test.
μ₀= 10μ = 10.5σ = 3.1N = 50Z = (μ - μ₀) / (σ / √N)Z = (10.5 - 10) / (3.1 / √50)Z = 1.14
As the calculated z-test statistic is less than 1.65, the sample mean lies in the non-rejection region. Hence, we fail to reject the null hypothesis. Even though the mean reading time of 50 articles in our sample is 10.5 minutes, which is greater than 10, we say that this observation is not statistically different, and this observation might be due to sampling error or any other reason. The reading time of other samples and the population may be less than or equal to 10 minutes.
While we use z-tests for large samples or when the population standard deviation is known, there are situations when the population standard deviation is unknown, or the sample size is very small. In such cases, we use t-test for hypothesis testing. Let’s now discuss t-test and its types with an example.
Hypothesis Testing with Python
Learn how to plan, implement, and interpret different kinds of hypothesis tests in Python.Try it for freeWhat is t-test?
T-test is also a hypothesis testing method to check if there is a significant difference between the mean of a given attribute of a sample and population. The t-test is used for samples with unknown population standard deviation and sample size less than 30 (n<30).
Like z-tests, we have one-sample and two-sample t-tests that we use in different conditions. The only difference between calculating the test statistic for the z-test and t-test is that we use sample standard deviations for calculating the test statistic in the t-test because the population standard deviation is unknown.
One sample t-test
The formula for calculating the test statistic for the one-sample t-test is as follows:
T = (μ - μ₀) / (s / √N)
Here,
- T denotes the t-test statistic.
- μ is the mean of an attribute in a given sample data.
- μ₀ is the hypothesized mean of the population that we are testing for.
- s is the standard deviation of the sample.
- N is the number of data points in the sample.
Two sample t-test
We use the following formulae for two-sample t-tests:
T = (μ₁ - μ₂) / √((s₁² / N₁) + (s₂² / N₂))
Here,
- T denotes the t-test statistic.
- μ₁ and μ₂ are the means of an attribute for the given samples.
- s₁ and s₂ are the standard deviations of the samples.
- N₁ and N₂ are the sample sizes.
One sample t-test example
Suppose we are again told that the average reading time for an article at Codecademy is not more than 10 minutes. To check this assumption, we only have the reading time of a random sample of 15 articles, which is 10.5 minutes, with a sample standard deviation of 1.15 minutes. Also, we don’t have the population standard deviation. Now, to check if the observed mean is statistically different from the given population mean, we will use one sample t-test.
We will first establish the hypothesis as follows:
H₀: μ<=10Hₐ: μ>10
- As we are interested in checking if the mean reading time is less than or equal to 10, the rejection region will lie only on the right side of the distribution curve. Hence, we will use the one-tailed t-test.
- After deciding on the test, we will specify the acceptable significance level. For this example, we will choose significance level α=0.05.
- Using the significance level α, we will specify the decision rule. For a one-tailed t-test with a significance level of α=0.05, the right tail of the distribution curve will have 5 percent rejection region. Hence, we will get the critical t-test statistic for the cumulative probability of 0.95 and sample size of 15 from the t-table, which is 1.761. After obtaining the critical t-test statistic, we will state the rules for rejecting the hypothesis.
If T > 1.761, reject H₀.If T < 1.761, fail to reject H₀.
Next, we will calculate the t-test statistic using the given data.
μ₀=10μ = 10.5s = 1.15N =15T = (μ - μ₀) / (s / √N)T = (10.5 - 10) / (1.15 / √15)T = 1.68
As the calculated t-test statistic is less than 1.761, the sample mean lies in the non-rejection region. Hence, we fail to reject the null hypothesis.
The z-test and t-test may seem similar, but they have important differences that affect when to use each one. Let’s explore their similarities and differences and how to choose the right test for hypothesis testing
Z-test vs T-test: when to use each for hypothesis testing?
Although we use z-test and t-test in different scenarios, there are multiple similarities between them.
- Purpose of the test: We use both z-test and t-test to determine if the difference between sample means and population means are statistically significant.
- Testing methodology: Both the z-test and t-test use the null hypothesis and alternate hypothesis to check if there is a significant difference between the sample mean and population mean.
- Data distribution: Both z-test and t-test assume that the data is normally distributed. In the z-test, the data is assumed to have a standard normal distribution. For t-test, data is assumed to have student’s t-distribution.
- Data type: We use z-test and t-test for continuous numerical data.
- Test statistic formula: Both z-test and t-test use a similar formula for calculating the test statistic, which uses the difference between sample and population means and the standard error.
Despite the similarities, z-test and t-test have many differences due to which they are used in different situations. The following table summarizes the difference between the z-test and the t-test.
Feature | Z-test | T-test |
---|---|---|
Sample size | Sample size should be greater than 30. | Sample size is less than 30. |
Population Variance | Population variance is known. | Population variance is unknown. |
Data distribution | Data has a standard normal distribution. | Data has student’s t distribution. |
Critical Value | Depends on the significance level α. | Depends on the significance level α and the sample size. |
Z-Test vs. T-Test: a quick comparison
- Z-Test: Used for large samples (n > 30) with known population variance.
- T-Test: Used for small samples (n < 30) with unknown population variance.
So, we have determined when to use z-test or t-test. However, we haven’t discussed the reason for using the t-test instead of the z-test when the sample size is small. Let’s discuss this using an example.
Why do we use t-test instead of z-test for small samples?
The choice between using the z-test and t-test is influenced by the data distribution. The z-test assumes that the data has a standard normal distribution. On the other hand, the t-test assumes that the data has the student’s t distribution. In the student’s t distribution, the probability of values occurring at extremes is higher. For instance, consider the following image.
In the image, we have plotted data with standard normal distribution and student’s t distribution. We can see that the data with standard normal distribution is concentrated around the mean value zero and has a very thin probability density on the extremes. On the contrary, the student’s t distribution has data points farther away from the mean value, and it has heavier tails. Due to this, there is a high probability of finding extreme values in student’s t distribution.
To account for the higher probability of finding extreme values, t-test uses critical values that are significantly larger than z-test critical values for the same significance level. This allows the t-test to reduce false positives and it becomes harder to reject the null hypothesis for small sample sizes. Thus the t-test accounts for the increased variability in smaller samples.
For larger samples, we can assume that the data has a standard normal distribution. In such cases, the adjustments of t-distribution aren’t necessary, and we can use z-test for hypothesis testing.
Conclusion
Both z-tests and t-tests are fundamental tools in statistical hypothesis testing. Z-tests are ideal when dealing with large sample sizes and known population variances, ensuring accurate approximations through the normal distribution. On the other hand, t-tests are more versatile, making them the preferred choice for smaller sample sizes or when population variance is unknown.
To learn more about hypothesis testing, you can go through this course on hypothesis testing with Python. You might also like this course on significance thresholds in hypothesis testing. Happy Learning!
Author
'The Codecademy Team, composed of experienced educators and tech experts, is dedicated to making tech skills accessible to all. We empower learners worldwide with expert-reviewed content that develops and enhances the technical skills needed to advance and succeed in their careers.'
Meet the full teamRelated articles
- Article
Hands-on Statistics with NumPy in Python
Learn how to calculate measures of central tendency like mean, median, and weighted mean, and measures of spread like range, variance, and standard deviation using the NumPy module in Python. - Article
The Machine Learning Process
Learn the general structure of how to approach Machine Learning problems in a methodical way. - Article
How to Select a Meaningful Visualization
This article will guide you through the process of selecting a graph for a visualization.
Learn more on Codecademy
- Course
Hypothesis Testing with Python
Learn how to plan, implement, and interpret different kinds of hypothesis tests in Python.With CertificateIntermediate6 hours - Free course
Introduction to Hypothesis Testing
Learn how to run t-tests and binomial tests in this introduction to Hypothesis Testing.Beginner Friendly2 hours - Free course
Hypothesis Testing: Significance Thresholds
Learn how to evaluate statistical significance and the best thresholds to use.Beginner Friendly1 hour