We saw that predicted outcomes from a linear regression model range from negative to positive infinity. These predictions don’t really make sense for a classification problem. Step in logistic regression!
To build a logistic regression model, we apply a logit link function to the left-hand side of our linear regression function. Remember the equation for a linear model looks like this:
When we apply the logit function, we get the following:
For the Codecademy University example, this means that we are fitting the curve shown below to our data — instead of a line, like in linear regression:
Notice that the red line stays between 0 and 1 on the y-axis. It now makes sense to interpret this value as a probability of group membership; whereas that would have been non-sensical for regular linear regression.
Note that this is a pretty nifty trick for adapting a linear regression model to solve classification problems! There are actually many other kinds of link functions that we can use for different adaptations.
Instructions
We’ve provided the code to build a logistic regression model on the Codecademy University data and plot the fitted curve. Take a look at the plot. Expand the plot to fullscreen for a larger view.
Using this curve, estimate the probability that a student who studied for five hours will pass the exam. Save the result as five_hour_studier
and press “Run”.