In this lesson, we will walk through a simulation of a binomial hypothesis test in Python. Binomial tests are useful for comparing the frequency of some outcome in a sample to the expected probability of that outcome. For example, if we expect 90% of ticketed passengers to show up for their flight but only 80 of 100 ticketed passengers actually show up, we could use a binomial test to understand whether 80 is significantly different from 90.
Binomial tests are similar to one-sample t-tests in that they test a sample statistic against some population-level expectation. The difference is that:
- binomial tests are used for binary categorical data to compare a sample frequency to an expected population-level probability
- one-sample t-tests are used for quantitative data to compare a sample mean to an expected population mean.
In Python, as in many other programming languages used for statistical computing, there are a number of libraries and functions that allow a data scientist to run a hypothesis test in a single line of code. However, a data scientist will be much more likely to spot and fix potential errors and interpret results correctly if they have a conceptual understanding of how these functions work. To that end, this lesson will help you build your own conceptual understanding!
The next few exercises will walk through the process of using a binomial test to analyze data from a hypothetical online company, Live-it-LIVE.com — a website that sells all the necessary props and costumes to recreate iconic movie scenes at home!
The data we’ll be working with has been loaded for you in the workspace and saved as an object named
monthly_report so that you can inspect it in the web browser.
Note that the
purchase column tells us whether a purchase was made; if so, the item that was purchased is listed in the
item column. Feel free to scroll through the data so you can inspect more of the items!