Learn

Let’s return to the logistic regression equation and demonstrate how this works by fitting a model in sklearn. The equation is:

$ln(\frac{p}{1-p}) = b_{0} + b_{1}x_{1} + b_{2}x_{2} +\cdots + b_{n}x_{n}$

Suppose that we want to fit a model that predicts whether a visitor to a website will make a purchase. We’ll use the number of minutes they spent on the site as a predictor. The following code fits the model:

from sklearn.linear_model import LogisticRegression
model = LogisticRegression()
model.fit(purchase, min_on_site)

Next, just like linear regression, we can use the right-hand side of our regression equation to make predictions for each of our original datapoints as follows:

log_odds = model.intercept_ + model.coef_ * min_on_site
print(log_odds)

Output:

[[-3.28394203]
[-1.46465328]
[-0.02039445]
[ 1.22317391]
[ 2.18476234]]

Notice that these predictions range from negative to positive infinity: these are log odds. In other words, for the first datapoint, we have:

$ln(\frac{p}{1-p}) = -3.28394203$

We can turn log odds into a probability as follows:

\begin{aligned} ln(\frac{p}{1-p}) = -3.28 \\ \frac{p}{1-p} = e^{-3.28} \\ p = e^{-3.28} (1-p) \\ p = e^{-3.28} - e^{-3.28}*p \\ p + e^{-3.28}*p = e^{-3.28} \\ p * (1 + e^{-3.28}) = e^{-3.28} \\ p = \frac{e^{-3.28}}{1 + e^{-3.28}} \\ p = 0.04 \end{aligned}

In Python, we can do this simultaneously for all of the datapoints using NumPy (loaded as np):

np.exp(log_odds)/(1+ np.exp(log_odds))

Output:

array([[0.0361262 ],
[0.18775665],
[0.49490156],
[0.77262162],
[0.89887279]])

The calculation that we just did required us to use something called the sigmoid function, which is the inverse of the logit function. The sigmoid function produces the S-shaped curve we saw previously: ### Instructions

1.

In the workspace, we’ve fit a logistic regression on the Codecademy University data and saved the intercept and coefficient on hours_studied as intercept and coef, respectively.

For each student in the dataset, use the intercept and coefficient to calculate the log odds of passing the exam. Save the result as log_odds.

2.

Now, convert the predicted log odds for each student into a predicted probability of passing the exam. Save the predicted probabilities as pred_probability_passing.