Learn

Now that we’ve learned a little bit about how logistic regression works, let’s fit a model using sklearn.

To do this, we’ll begin by importing the LogisticRegression module and creating a LogisticRegression object:

from sklearn.linear_model import LogisticRegression model = LogisticRegression()

After creating the object, we need to fit our model on the data. We can accomplish this using the .fit() method, which takes two parameters: a matrix of features and a matrix of class labels (the outcome we are trying to predict).

model.fit(features, labels)

Now that the model is trained, we can access a few useful attributes:

  • model.coef_ is a vector of the coefficients of each feature
  • model.intercept_ is the intercept

The coefficients can be interpreted as follows:

  • Large positive coefficient: a one unit increase in that feature is associated with a large increase in the log odds (and therefore probability) of a datapoint belonging to the positive class (the outcome group labeled as 1)
  • Large negative coefficient: a one unit increase in that feature is associated with a large decrease in the log odds/probability of belonging to the positive class.
  • Coefficient of 0: The feature is not associated with the outcome.

One important note is that sklearn‘s logistic regression implementation requires the features to be standardized because regularization is implemented by default.

Instructions

1.

We’ve pre-processed this data and split it into training and test sets as follows:

  • X_train is the feature matrix, containing standardized training data for hours studied and practice test score
  • y_train contains the outcome variable for the training data: whether or not each student passed the final exam (1 indicates passing, 0 indicates failing)

Create a LogisticRegression object named cc_lr and fit it to the provided training data.

2.

Print out the coefficients and intercept for the model. Are the coefficients positive or negative and does this match your expectation? Which feature (hours studied or practice test score) is more strongly associated with students’ probability of passing the final exam?

Take this course for free

Mini Info Outline Icon
By signing up for Codecademy, you agree to Codecademy's Terms of Service & Privacy Policy.

Or sign up using:

Already have an account?