We saw that predicted outcomes from a linear regression model range from negative to positive infinity. These predictions don’t really make sense for a classification problem. Step in logistic regression!

To build a logistic regression model, we apply a logit link function to the left-hand side of our linear regression function. Remember the equation for a linear model looks like this:

y=b0+b1x1+b2x2++bnxny = b_{0} + b_{1}x_{1} + b_{2}x_{2} +\cdots + b_{n}x_{n}

When we apply the logit function, we get the following:

ln(y1y)=b0+b1x1+b2x2++bnxnln(\frac{y}{1-y}) = b_{0} + b_{1}x_{1} + b_{2}x_{2} +\cdots + b_{n}x_{n}

For the Codecademy University example, this means that we are fitting the curve shown below to our data — instead of a line, like in linear regression:

sigmoid function imposed on the plot of passing vs. hours studied

Notice that the red line stays between 0 and 1 on the y-axis. It now makes sense to interpret this value as a probability of group membership; whereas that would have been non-sensical for regular linear regression.

Note that this is a pretty nifty trick for adapting a linear regression model to solve classification problems! There are actually many other kinds of link functions that we can use for different adaptations.



We’ve provided the code to build a logistic regression model on the Codecademy University data and plot the fitted curve. Take a look at the plot. Expand the plot to fullscreen for a larger view.

Using this curve, estimate the probability that a student who studied for five hours will pass the exam. Save the result as five_hour_studier and press “Run”.

Sign up to start coding

Mini Info Outline Icon
By signing up for Codecademy, you agree to Codecademy's Terms of Service & Privacy Policy.

Or sign up using:

Already have an account?