Now that you know the inner workings of how Logistic Regression works, let’s learn how to easily and quickly create Logistic Regression models with
sklearn is a Python library that helps build, train, and evaluate Machine Learning models.
To take advantage of
sklearn‘s abilities, we can begin by creating a
model = LogisticRegression()
After creating the object, we need to fit our model on the data. When we fit the model with
sklearn it will perform gradient descent, repeatedly updating the coefficients of our model in order to minimize the log-loss. We train — or fit — the model using the
.fit() method, which takes two parameters. The first is a matrix of features, and the second is a matrix of class labels.
Now that the model is trained, we can access a few useful attributes of the
model.coef_is a vector of the coefficients of each feature
model.intercept_is the intercept
With our trained model we are able to predict whether new data points belong to the positive class using the
.predict() takes a matrix of features as a parameter and returns a vector of labels
0 for each sample. In making its predictions,
sklearn uses a classification threshold of
If we are more interested in the predicted probability of the data samples belonging to the positive class than the actual class, we can use the
predict_proba() also takes a matrix of features as a parameter and returns a vector of probabilities, ranging from
1, for each sample.
Before proceeding, one important note is that
sklearn‘s Logistic Regression implementation requires feature data to be normalized. Normalization scales all feature data to vary over the same range.
sklearn‘s Logistic Regression requires normalized feature data due to a technique called Regularization that it uses under the hood. Regularization is out of the scope of this lesson, but in order to ensure the best results from our model, we will be using a normalized version of the data from our Codecademy University example.
Let’s build, train and evaluate a Logistic Regression model in
sklearn for our Codecademy University data! We’ve imported
sklearn and the
LogisiticRegression classifier for you. Create a Logistic Regression model named
Train the model using
hours_studied_scaled as the training features and
passed_exam as the training labels.
Save the coefficients of the model to the variable
calculated_coefficients, and the intercept of the model to
The next semester a group of students in the Introductory Machine Learning course want to predict their final exam scores based on how much they intended to study for the exam. The number of hours each student thinks they will study, normalized, is given in
model to predict the probability that each student will pass the final exam, and save the probabilities to
That same semester, the Data Science department decides to update the final exam passage model to consider two features instead of just one. During the final exam, students were asked to estimate how much time they spent studying, as well as how many previous math courses they have taken. The student responses, along with their exam results, were split into training and test sets. The training features, normalized, are given to you in
exam_features_scaled_train, and the students’ results on the final are given in
Create a new Logistic Regression model named
model_2 and train it on
Use the model you just trained to predict whether each student in the test set,
exam_features_scaled_test, will pass the exam and save the predictions to
Compare the predictions to the actual student performance on the exam in the test set. How well did your model do?