When an email lands in your inbox, how does your email service know whether it’s real or spam? This evaluation is made billions of times per day, and one possible method is logistic regression.
Logistic regression is a supervised machine learning algorithm that predicts the probability, ranging from 0 to 1, of a datapoint belonging to a specific category, or class. These probabilities can then be used to assign, or classify, observations to the more probable group.
For example, we could use a logistic regression model to predict the probability that an incoming email is spam. If that probability is greater than
0.5, we could automatically send it to a spam folder. This is called binary classification because there are only two groups (eg., spam or not spam).
Some other examples of problems that we could solve using logistic regression:
- Disease identification — Is a tumor malignant?
- Customer conversion — Will a customer arriving on a sign-up page enroll in a service?
In this lesson you will learn how to perform logistic regression and use it to make predictions!
If you are unfamiliar with linear regression, we recommend you review it before proceeding. Otherwise, let’s dive in!
Codecademy University’s Data Science department is interested in creating a model to predict whether or not a student will pass the final exam of its Introductory Machine Learning course. They plan to accomplish this by building a logistic regression model that predicts the probability of passing based on the number of hours a student reports studying.
Run the code in script.py to plot the data.
0 indicates that a student failed the exam, and
1 indicates a student passed the exam.
How many hours does a student need to study to pass the exam?