While significance thresholds allow a data scientist to control the false positive rate for a single hypothesis test, this starts to break when performing multiple tests as part of a single study.
For example, suppose that we are writing a quiz at codecademy that is going to include 10 questions. For each question, we want to know whether the probability of a learner answering the question correctly is different from 70%. We now have to run 10 hypothesis tests, one for each question.
If the null hypothesis is true for every hypothesis test (the probability of a correct answer is 70% for every question) and we use a .05 significance level for each test, then:
When we run a hypothesis test for a single question, we have a 95% chance of getting the right answer (a p-value > 0.05) — and a 5% chance of making a type I error.
When we run hypothesis tests for two questions, we have only a 90% chance of getting the right answer for both hypothesis tests (.95*.95 = 0.90) — and a 10% chance of making at least one type I error.
When we run hypothesis tests for all 10 questions, we have a 60% chance of getting the right answer for all ten hypothesis tests (0.95^10 = 0.60) — and a 40% chance of making at least one type I error.
To address this problem, it is important to plan research out ahead of time: decide what questions you want to address and figure out how many hypothesis tests you need to run. When running multiple tests, use a lower significance threshold (eg., 0.01) for each test to reduce the probability of making a type I error.
In the workspace, code has been provided to create a graph that shows the probability of making at least one type I error among some number of tests with a significance threshold of 0.05.
Approximately how many tests would we have to run at a 0.05 significance level so that the probability of at least one type I error would be 50%? Save your answer as the variable
num_tests_50percent (note: this can be approximate; it does not need to be exact!)
Change the code to create the plot so that it shows the probability of at least one type I error for multiple tests with a significance threshold of 0.10 (instead of 0.05).
Inspect your new plot. Now how many tests would lead to a probability of a type I error of 50%?