Let’s return to the logistic regression equation and demonstrate how this works by fitting a model in sklearn. The equation is:
Suppose that we want to fit a model that predicts whether a visitor to a website will make a purchase. We’ll use the number of minutes they spent on the site as a predictor. The following code fits the model:
from sklearn.linear_model import LogisticRegression model = LogisticRegression() model.fit(purchase, min_on_site)
Next, just like linear regression, we can use the right-hand side of our regression equation to make predictions for each of our original datapoints as follows:
log_odds = model.intercept_ + model.coef_ * min_on_site print(log_odds)
Output:
[[-3.28394203] [-1.46465328] [-0.02039445] [ 1.22317391] [ 2.18476234]]
Notice that these predictions range from negative to positive infinity: these are log odds. In other words, for the first datapoint, we have:
We can turn log odds into a probability as follows:
In Python, we can do this simultaneously for all of the datapoints using NumPy (loaded as np
):
np.exp(log_odds)/(1+ np.exp(log_odds))
Output:
array([[0.0361262 ], [0.18775665], [0.49490156], [0.77262162], [0.89887279]])
The calculation that we just did required us to use something called the sigmoid function, which is the inverse of the logit function. The sigmoid function produces the S-shaped curve we saw previously:

Instructions
In the workspace, we’ve fit a logistic regression on the Codecademy University data and saved the intercept and coefficient on hours_studied
as intercept
and coef
, respectively.
For each student in the dataset, use the intercept and coefficient to calculate the log odds of passing the exam. Save the result as log_odds
.
Now, convert the predicted log odds for each student into a predicted probability of passing the exam. Save the predicted probabilities as pred_probability_passing
.