Using a trained model, we can predict whether new datapoints belong to the positive class (the group labeled as 1) using the .predict() method. The input is a matrix of features and the output is a vector of predicted labels, 1 or 0.

print(model.predict(features)) # Sample output: [0 1 1 0 0]

If we are more interested in the predicted probability of group membership, we can use the .predict_proba() method. The input to predict_proba() is also a matrix of features and the output is an array of probabilities, ranging from 0 to 1:

print(model.predict_proba(features)[:,1]) # Sample output: [0.32 0.75 0.55 0.20 0.44]

By default, .predict_proba() returns the probability of class membership for both possible groups. In the example code above, we’ve only printed out the probability of belonging to the positive class. Notice that datapoints with predicted probabilities greater than 0.5 (the second and third datapoints in this example) were classified as 1s by the .predict() method. This is a process known as thresholding. As we can see here, sklearn sets the default classification threshold probability as 0.5.



In the workspace, we’ve fit the same logistic regression model on the CodecademyU training data. We’ve also created X_test and y_test, which contain the testing data.

Use the .predict() method to predict whether the students in the test dataset will pass the final exam, then print out the resulting vector of predictions.


Now, use the .predict_proba() method to calculate the predicted probability that each student in the test dataset will pass the exam. Print out the results.


Print out y_test to see whether the students in the test dataset actually passed the exam. Did the model make accurate predictions? Looking at the probabilities, do the misclassification(s) make sense?

Take this course for free

Mini Info Outline Icon
By signing up for Codecademy, you agree to Codecademy's Terms of Service & Privacy Policy.

Or sign up using:

Already have an account?