With the data from Codecademy University, we want to predict whether each student will pass their final exam. Recall that in linear regression, we fit a line of the following form to the data:
yis the value we are trying to predict
b_0is the intercept of the regression line
b_nare the coefficients
x_nare the predictors (also sometimes called features)
For our data,
y is a binary variable, equal to either
1 (passing), or
0 (failing). We have only one predictor (
num_hours_studied. Below we’ve fitted a linear regression model to our data and plotted the results. The best fit line is in red.
We see that the linear model does not fit the data well. Our goal is to predict whether a student passes or fails; however, a best fit line allows predictions between negative and positive infinity.
We’ve provided you with the code to train a linear regression model on the Codecademy University data and plot the regression line. Run the code and observe the plot. Expand the plot to fullscreen for a larger view.
Using the regression line, estimate the predicted outcomes (given by the line) for students who study
10 hours, and
30 hours, respectively. Save the results to
How would you use these numerical outcomes to determine whether a student is predicted to pass or fail? Can you think of a threshold you might use?