With the data from Codecademy University, we want to predict whether each student will pass their final exam. Recall that in linear regression, we fit a line of the following form to the data:

`$y = b_{0} + b_{1}x_{1} + b_{2}x_{2} +\cdots + b_{n}x_{n}$`

where

`y`

is the value we are trying to predict`b_0`

is the intercept of the regression line`b_1`

,`b_2`

, …`b_n`

are the coefficients`x_1`

,`x_2`

, …`x_n`

are the predictors (also sometimes called*features*)

For our data, `y`

is a binary variable, equal to either `1`

(passing), or `0`

(failing). We have only one predictor (`x_1`

): `num_hours_studied`

. Below we’ve fitted a linear regression model to our data and plotted the results. The best fit line is in red.

We see that the linear model does not fit the data well. Our goal is to predict whether a student passes or fails; however, a best fit line allows predictions between negative and positive infinity.

### Instructions

**1.**

We’ve provided you with the code to train a linear regression model on the Codecademy University data and plot the regression line. Run the code and observe the plot. Expand the plot to fullscreen for a larger view.

Using the regression line, estimate the predicted outcomes (given by the line) for students who study `0`

hour, `10`

hours, and `30`

hours, respectively. Save the results to `slacker`

, `average`

, and `studious`

.

How would you use these numerical outcomes to determine whether a student is predicted to pass or fail? Can you think of a threshold you might use?