Similar to linear regression, the underlying assumption of logistic regression is that the features are linearly related to the logit of the outcome. To test this visually, we can use seaborn’s regplot, with the parameter logistic= True and the x value as our feature of interest. If this condition is met, the fit model will resemble a sigmoidal curve (as is the case when x=radius_mean).

We’ve added code to create another plot using the feature fractal_dimension_mean. Press Run in the workspace. How do the curves compare?

2. Multicollinearity

Like in linear regression, one of the assumptions is that there is not multicolinearity in the data. There are many ways to look at this, but the most common are a correlation of features and variance inflation factor (VIF). With a correlation plot, features that are highly correlated can be dropped from the model to reduce duplication.

We’re going to look at the “mean” features which are highly correlated with each other using a heatmap. Uncomment the relevant lines of code and press Run to see the heatmap. There are two features that are highly positively correlated with radius_mean. Can you spot them?

The heatmap shows that the radius, perimeter, and area are all highly positively correlated. (Think about the formula for an area of a circle!)



There are another pair of features that are highly correlated too. Create an array named correlated_pair containing the two features.

Take this course for free

Mini Info Outline Icon
By signing up for Codecademy, you agree to Codecademy's Terms of Service & Privacy Policy.

Or sign up using:

Already have an account?