Python Diagnostic Plots
Diagnostic plots are essential tools for evaluating the assumptions and performance of regression models. In the context of linear regression, these plots help identify potential issues such as non-linearity, non-constant variance, outliers, high leverage points, and collinearity. The statsmodels library in Python provides several functions to generate these diagnostic plots, aiding in assessing model fit and validity.
Common diagnostic plots include:
- Residual plots: Check for homoscedasticity and non-linearity.
- Q-Q plots: Assess the normality of residuals.
- Leverage plots: Identify influential points.
- Scale-location plots: Detect patterns in residual variance.
Syntax
There are several different methods for generating diagnostic plots in statsmodels. Two common methods are plot_partregress_grid() and plot_regress_exog(). These methods work with a fitted regression results object.
plot_partregress_grid()
The plot_partregress_grid() method generates diagnostic plots for all explanatory variables in the model. It helps assess the relationship between the residuals and each independent variable.
The syntax for using plot_partregress_grid() is:
results.plot_partregress_grid()
resultsrefers to the fitted regression results object.
plot_regress_exog()
The plot_regress_exog() method generates residual plots for a specific independent variable. This can help check the assumption of linearity with respect to a particular predictor.
The syntax for using plot_regress_exog() is:
results.plot_regress_exog(exog_idx)
resultsrefers to the fitted regression results object.exog_idxis the index of the explanatory variable whose relationship with the dependent variable you want to plot.
Example
Below is an example demonstrating how to generate diagnostic plots for a linear regression model using statsmodels:
import statsmodels.api as smimport numpy as npimport pandas as pdimport matplotlib.pyplot as plt# Create synthetic datanp.random.seed(0)X = np.random.rand(100, 2)X = sm.add_constant(X) # Add constant (intercept)y = X[:, 1] + X[:, 2] + np.random.randn(100) # Dependent variable with some noise# Fit linear regression modelmodel = sm.OLS(y, X)results = model.fit()# Generate diagnostic plots for all variablesfig = plt.figure(figsize=(10, 8))fig = sm.graphics.plot_partregress_grid(results)plt.show()# Alternatively, generate a residual plot for the first independent variablefig = plt.figure(figsize=(10, 8))fig = sm.graphics.plot_regress_exog(results, 1)plt.show()
The output will be as follows:


Contribute to Docs
- Learn more about how to get involved.
- Edit this page on GitHub to fix an error or make an improvement.
- Submit feedback to let us know how we can improve Docs.
Learn Python on Codecademy
- Looking for an introduction to the theory behind programming? Master Python while learning data structures, algorithms, and more!
- Includes 6 Courses
- With Professional Certification
- Beginner Friendly.75 hours
- Learn the basics of Python 3.12, one of the most powerful, versatile, and in-demand programming languages today.
- With Certificate
- Beginner Friendly.24 hours