Learn

We saw that predicted outcomes from a linear regression model range from negative to positive infinity. These predictions don’t really make sense for a classification problem. Step in logistic regression!

To build a logistic regression model, we apply a logit link function to the left-hand side of our linear regression function. Remember the equation for a linear model looks like this:

y=b0+b1x1+b2x2++bnxny = b_{0} + b_{1}x_{1} + b_{2}x_{2} +\cdots + b_{n}x_{n}

When we apply the logit function, we get the following:

ln(y1y)=b0+b1x1+b2x2++bnxnln(\frac{y}{1-y}) = b_{0} + b_{1}x_{1} + b_{2}x_{2} +\cdots + b_{n}x_{n}

For the Codecademy University example, this means that we are fitting the curve shown below to our data — instead of a line, like in linear regression:

sigmoid function imposed on the plot of passing vs. hours studied

Notice that the red line stays between 0 and 1 on the y-axis. It now makes sense to interpret this value as a probability of group membership; whereas that would have been non-sensical for regular linear regression.

Note that this is a pretty nifty trick for adapting a linear regression model to solve classification problems! There are actually many other kinds of link functions that we can use for different adaptations.

Instructions

1.

We’ve provided the code to build a logistic regression model on the Codecademy University data and plot the fitted curve. Take a look at the plot. Expand the plot to fullscreen for a larger view.

Using this curve, estimate the probability that a student who studied for five hours will pass the exam. Save the result as five_hour_studier and press “Run”.

Sign up to start coding

Mini Info Outline Icon
By signing up for Codecademy, you agree to Codecademy's Terms of Service & Privacy Policy.

Or sign up using:

Already have an account?