Say you work for a major social media website. Your boss comes to you with two questions:
- does the demographic of users on your site match the company’s expectation?
- did the new interface update affect user engagement?
With terabytes of user data at your hands, you decide the best way to answer these questions is with statistical hypothesis tests!
Statistical hypothesis testing is a process that allows you to evaluate if a change or difference seen in a dataset is “real”, or if it’s just a result of random fluctuation in the data.
Hypothesis testing can be an integral component of any decision making process. It provides a framework for evaluating how confident one can be in making conclusions based on data. Some instances where this might come up include:
- a professor expects an exam average to be roughly 75%, and wants to know if the actual scores line up with this expectation. Was the test actually too easy or too hard?
- a product manager for a website wants to compare the time spent on different versions of a homepage. Does one version make users stay on the page significantly longer?
In this lesson, you will cover the fundamental concepts that will help you run and evaluate hypothesis tests:
- Sample and Population Mean
- Significance Level
- Type I and Type II Errors
You will then learn about three different hypothesis tests you can perform to answer the kinds of questions discussed above:
- One Sample T-Test
- Two Sample T-Test
- ANOVA (Analysis of Variance)
Let’s get started!
The code in
notebook.Rmd performs a hypothesis test on data for a company BuyPie.com. The test evaluates whether the time spent per visitor on the website changes significantly between two weeks.
Read the output at the bottom of the rendered notebook. Do you think there is a difference in time spent per visitor between Week 1 and Week 2?
By the end of the lesson, you will be able to perform and interpret such hypothesis tests yourself!