We have examined how changing the threshold can affect the logistic regression predictions, without retraining or changing the coefficients of the model. In essence, there is a continuum of predictions available in a single model by the varying the threshold incrementally from zero to one. For each of these thresholds, the true and false-positive rates can be calculated and then plot. The resulting curve these points form is known as the Receiver Operating Characteristic (ROC) curve.
To plot the ROC curve, we use
roc_curve function, where the input contains the arrays
y_score and output the arrays false-positive rate, true-positive rate, and threshold values. The plot of the true-positive rate vs false-positive rate gives us the ROC Curve.
We’ve plotted the ROC Curve for the dataset and model we’ve been working with throughout this lesson.You will notice that the threshold value is not discernible from the curve alone. We’ve labelled the threshold value on the plot itself for clarity, chose a list of ~5 points to label the threshold value on the curve. Plot the ROC curve for a “DummyClassifier” using the most-frequent class. We’ve also plotted the ROC curve of a “dummy classifier”, that predicts that all the data points belong to the more frequent class.
The area under the curve (AUC) is a single numeric value, from zero to one, that is often used as a metric in evaluating classification models. A value close to one is a near-perfect classifier, whereas a value of 0.5 (which corresponds to the identity line, i.e. the dummy classifier!) is equivalent to random guessing (the null model).
Find the ROC AUC score using
roc_auc_score function. The input to this function has to contain the two arrays,
y_true is the true binary label and
y_score is the predicted probability of the positive class.