With the data from Codecademy University, we want to predict whether each student will pass their final exam. Recall that in linear regression, we fit a line of the following form to the data:

y=b0+b1x1+b2x2++bnxny = b_{0} + b_{1}x_{1} + b_{2}x_{2} +\cdots + b_{n}x_{n}


  • y is the value we are trying to predict
  • b_0 is the intercept of the regression line
  • b_1, b_2, … b_n are the coefficients
  • x_1, x_2, … x_n are the predictors (also sometimes called features)

For our data, y is a binary variable, equal to either 1 (passing), or 0 (failing). We have only one predictor (x_1): num_hours_studied. Below we’ve fitted a linear regression model to our data and plotted the results. The best fit line is in red.

We see that the linear model does not fit the data well. Our goal is to predict whether a student passes or fails; however, a best fit line allows predictions between negative and positive infinity.



We’ve provided you with the code to train a linear regression model on the Codecademy University data and plot the regression line. Run the code and observe the plot. Expand the plot to fullscreen for a larger view.

Using the regression line, estimate the predicted outcomes (given by the line) for students who study 0 hour, 10 hours, and 30 hours, respectively. Save the results to slacker, average, and studious.

How would you use these numerical outcomes to determine whether a student is predicted to pass or fail? Can you think of a threshold you might use?

Sign up to start coding

Mini Info Outline Icon
By signing up for Codecademy, you agree to Codecademy's Terms of Service & Privacy Policy.

Or sign up using:

Already have an account?