Whenever we run a hypothesis test using a significance threshold, we expose ourselves to making two different kinds of mistakes: type I errors (false positives) and type II errors (false negatives):
|Null hypothesis:||is true||is false|
|P-value significant||Type I Error||Correct!|
|P-value not significant||Correct!||Type II error|
Consider the quiz question hypothesis test described in the previous exercises:
- Null: The probability that a learner answers a question correctly is 70%.
- Alternative: The probability that a learner answers a question correctly is not 70%.
Suppose, for a moment, that the true probability of a learner answering the question correctly is 70% (if we showed the question to ALL learners, exactly 70% would answer it correctly). This puts us in the first column of the table above (the null hypothesis “is true”). If we run a test and calculate a significant p-value, we will make type I error (also called a false positive because the p-value is falsely significant), leading us to remove the question when we don’t need to.
On the other hand, if the true probability of getting the question correct is not 70%, the null hypothesis “is false” (the right-most column of our table). If we run a test and calculate a non-significant p-value, we make a type II error, leading us to leave the question on our site when we should have taken it down.
Suppose that the average score on a standardized test is 50 points. A researcher wants to know whether students who take this test in an ergonomically designed chair score significantly differently from the general population of test-takers. The researcher randomly assigns 100 students to take the test in an ergonomic chair. Then, the researcher runs a hypothesis test with a significance threshold of 0.05 and the following null and alternative hypotheses:
- Null: The mean score for students who take the test in an ergonomic chair is 50 points.
- Alternative: The mean score for students who take the test in an ergonomic chair is not 50 points.
Suppose that the truth (which the researcher doesn’t know) is: if every student took the test in an ergonomic chair, the average score for all test-takers would be 52 points.
Based on their sample of only 100 students, the researcher calculates a p-value of 0.07. In script.py, change the value of
'correct'if the researcher will come to the correct conclusion based on this test
'type one'if the researcher will make a type I error based on this test
'type two'if the researcher will make a type II error based on this test