Hypothesis Testing
Published Feb 14, 2025
Contribute to Docs
Hypothesis testing is a fundamental statistical method used in data science to make inferences about a population based on sample data. It helps in determining whether an observed effect is statistically significant or if it occurred by random chance.
Key Concepts
Null and Alternative Hypotheses
- Null Hypothesis (H₀): A statement that assumes no effect or no difference exists. It represents the default assumption.
- Alternative Hypothesis (H₁ or Ha): A statement that contradicts the null hypothesis, suggesting an effect or a difference exists.
Significance Level (α)
- The probability threshold for rejecting the null hypothesis, commonly set at 0.05 (5%).
P-Value
- A measure of the probability that the observed data would occur if the null hypothesis were true. A small p-value (≤ α) suggests strong evidence against H₀, leading to its rejection.
Test Statistics
- A numerical value calculated from sample data used to determine whether to reject H₀.
- Common test statistics, calculated from hypothesis tests, include:
- Z-score: Used when population variance is known.
- T-score: Used when population variance is unknown and the sample size is small.
- Chi-square statistic: Used for categorical data.
- F-statistic: Used in variance analysis (ANOVA).
Type I and Type II Errors
- Type I Error (False Positive): Rejecting H₀ when it is actually true.
- Type II Error (False Negative): Failing to reject H₀ when it is actually false.
Steps in Hypothesis Testing
- State the Hypotheses: The first step is to define the null hypothesis (H₀) and the alternative hypothesis (H₁) clearly.
- Choose the Significance Level (α): Determine the threshold probability (commonly 0.05) for rejecting the null hypothesis.
- Select the Appropriate Test: Choose the statistical test that best suits the data and research question, such as a T-test, Chi-square test, or ANOVA.
- Compute the Test Statistic and P-value: Perform the statistical test to calculate the test statistic and corresponding p-value.
- Compare P-value with α: If the p-value is less than or equal to α, reject H₀, indicating significant evidence for H₁. Otherwise, fail to reject H₀, suggesting insufficient evidence against it.
- Draw a Conclusion: Based on the results, interpret whether there is enough evidence to support the alternative hypothesis.
Common Hypothesis Tests
- Z-Test: Used when the sample size is large (n > 30) and population variance is known.
- T-Test: Used when the sample size is small (n ≤ 30) and population variance is unknown. Common variants of T-test include:
- One-sample T-test: Compares sample mean to a known population mean.
- Two-sample T-test: Compares means of two independent groups.
- Paired T-test: Compares means of two related groups.
- Chi-Square Test: Used for categorical data to test the independence or goodness of fit.
- ANOVA (Analysis of Variance): Typically used when comparing three or more group means.
- Mann-Whitney U Test: A non-parametric alternative to the T-test for comparing two independent groups.
Hypothesis Testing
- T-table
- A statistical tool used in hypothesis testing to determine critical values for t-distributions and assess the significance of test results.
- Z-table
- A Z-table shows the cumulative probabilities of a standard normal distribution, helping find the probability of a value occurring below a given z-score.
Contribute to Docs
- Learn more about how to get involved.
- Edit this page on GitHub to fix an error or make an improvement.
- Submit feedback to let us know how we can improve Docs.
Learn Data Science on Codecademy
- Career path
Data Scientist: Machine Learning Specialist
Machine Learning Data Scientists solve problems at scale, make predictions, find patterns, and more! They use Python, SQL, and algorithms.Includes 27 CoursesWith Professional CertificationBeginner Friendly95 hours - Course
Learn Python 3
Learn the basics of Python 3.12, one of the most powerful, versatile, and in-demand programming languages today.With CertificateBeginner Friendly23 hours