You begin the statistical hypothesis testing process by defining a hypothesis, or an assumption about your population that you want to test. A hypothesis can be written in words, but can also be explained in terms of the sample and population means you just learned about.
Say you are developing a website and want to compare the time spent on different versions of a homepage. You could run a hypothesis test to see if version A or B makes users stay on the page significantly longer. Your hypothesis might be:
"The average time spent on homepage A is greater than the average time spent on homepage B."
While this is a fine hypothesis to make, data analysts are often very hesitant people. They don’t like to make bold claims without having data to back them up! Thus when constructing hypotheses for a hypothesis test, you want to formulate a null hypothesis. A null hypothesis states that there is no difference between the populations you are comparing, and it implies that any difference seen in the sample data is due to sampling error. A null hypothesis for the same scenario is as follows:
"The average time spent on homepage A is the same as the average time spent on homepage B."
You could also restate this in terms of population mean:
"The population mean of time spent on homepage A is the same as the population mean of time spent on homepage B."
After collecting some sample data on how users interact with each homepage, you can then run a hypothesis test using the data collected to determine whether your null hypothesis is true or false, or can be rejected (i.e. there is a difference in time spent on homepage A or B).
Instructions
A researcher at a pharmaceutical company is working on the development of a new medication to lower blood pressure, DeePressurize. They run an experiment with a control group of 100
patients that receive a placebo (a sugar pill), and an experimental group of 100
patients that receive DeePressurize. Blood pressure measurements are taken after a 3 month period on both groups of patients.
The researcher wants to run a hypothesis test to compare the resulting datasets. Two hypotheses, hypo_a
and hypo_b
, are given in notebook.Rmd
. Which could be a null hypothesis for comparing the two sets of data? Update the value of null_hypo_1
to the string "hypo_a"
or "hypo_b"
based on your answer.
A product manager at a dating app company is developing a new user profile page with a different picture layout. They want to see if the new layout results in more matches between users than the current layout. 50%
of profiles are updated to the new layout, and over a 1
month period the number of matches for users with the new layout and the original layout are recorded.
The product manager wants to run a hypothesis test to compare the resulting datasets. Two hypotheses, hypo_c
and hypo_d
, are given in notebook.Rmd
. Which could be a null hypothesis for comparing the two sets of data? Update the value of null_hypo_2
to the string "hypo_c"
or "hypo_d"
based on your answer.