Now that we’ve learned a little bit about how logistic regression works, let’s fit a model using
To do this, we’ll begin by importing the
LogisticRegression module and creating a
from sklearn.linear_model import LogisticRegression model = LogisticRegression()
After creating the object, we need to fit our model on the data. We can accomplish this using the
.fit() method, which takes two parameters: a matrix of features and a matrix of class labels (the outcome we are trying to predict).
Now that the model is trained, we can access a few useful attributes:
model.coef_is a vector of the coefficients of each feature
model.intercept_is the intercept
The coefficients can be interpreted as follows:
- Large positive coefficient: a one unit increase in that feature is associated with a large increase in the log odds (and therefore probability) of a datapoint belonging to the positive class (the outcome group labeled as
- Large negative coefficient: a one unit increase in that feature is associated with a large decrease in the log odds/probability of belonging to the positive class.
- Coefficient of 0: The feature is not associated with the outcome.
One important note is that
sklearn‘s logistic regression implementation requires the features to be standardized because regularization is implemented by default.
We’ve pre-processed this data and split it into training and test sets as follows:
X_trainis the feature matrix, containing standardized training data for hours studied and practice test score
y_traincontains the outcome variable for the training data: whether or not each student passed the final exam (
LogisticRegression object named
cc_lr and fit it to the provided training data.
Print out the coefficients and intercept for the model. Are the coefficients positive or negative and does this match your expectation? Which feature (hours studied or practice test score) is more strongly associated with students’ probability of passing the final exam?