One of the defining features of Logistic Regression is the interpretability we have from the feature coefficients. How to handle interpreting the coefficients depends on the kind of data you are working with (normalized or not) and the specific implementation of Logistic Regression you are using. We’ll discuss how to interpret the feature coefficients from a model created in
sklearn with normalized feature data.
Since our data is normalized, all features vary over the same range. Given this understanding, we can compare the feature coefficients’ magnitudes and signs to determine which features have the greatest impact on class prediction, and if that impact is positive or negative.
- Features with larger, positive coefficients will increase the probability of a data sample belonging to the positive class
- Features with larger, negative coefficients will decrease the probability of a data sample belonging to the positive class
- Features with small, positive or negative coefficients have minimal impact on the probability of a data sample belonging to the positive class
Given cancer data, a logistic regression model can let us know what features are most important for predicting survival after, for example, five years from diagnosis. Knowing these features can lead to a better understanding of outcomes, and even lives saved!
Let’s revisit the
sklearn Logistic Regression model we fit to our exam data in the last exercise. Remember, the two features in the new model are the number of hours studied and the number of previous math courses taken.
Using the model, given to you as
model_2 in the code editor, save the feature coefficients to the variable
In order to visualize the coefficients, let’s pull them out of the
numpy array in which they are currently stored. With
tolist() method we can convert the array into a list and grab the values we want to visualize.
Below your original assignment of
coefficients to equal
Create a bar graph comparing the feature coefficients with
plt.bar() method. Which feature appears to be more important in determining whether or not a student will pass the Introductory Machine Learning final exam?