Diagnostic Plots
Diagnostic plots are essential tools for evaluating the assumptions and performance of regression models. In the context of linear regression, these plots help identify potential issues such as non-linearity, non-constant variance, outliers, high leverage points, and collinearity. The statsmodels
library in Python provides several functions to generate these diagnostic plots, aiding in assessing model fit and validity.
Common diagnostic plots include:
- Residual plots: Check for homoscedasticity and non-linearity.
- Q-Q plots: Assess the normality of residuals.
- Leverage plots: Identify influential points.
- Scale-location plots: Detect patterns in residual variance.
Syntax
There are several different methods for generating diagnostic plots in statsmodels. Two common methods are plot_partregress_grid()
and plot_regress_exog()
. These methods work with a fitted regression results object.
plot_partregress_grid()
The plot_partregress_grid()
method generates diagnostic plots for all explanatory variables in the model. It helps assess the relationship between the residuals and each independent variable.
The syntax for using plot_partregress_grid()
is:
results.plot_partregress_grid()
results
refers to the fitted regression results object.
plot_regress_exog()
The plot_regress_exog()
method generates residual plots for a specific independent variable. This can help check the assumption of linearity with respect to a particular predictor.
The syntax for using plot_regress_exog()
is:
results.plot_regress_exog(exog_idx)
results
refers to the fitted regression results object.exog_idx
is the index of the explanatory variable whose relationship with the dependent variable you want to plot.
Example
Below is an example demonstrating how to generate diagnostic plots for a linear regression model using statsmodels
:
import statsmodels.api as smimport numpy as npimport pandas as pdimport matplotlib.pyplot as plt# Create synthetic datanp.random.seed(0)X = np.random.rand(100, 2)X = sm.add_constant(X) # Add constant (intercept)y = X[:, 1] + X[:, 2] + np.random.randn(100) # Dependent variable with some noise# Fit linear regression modelmodel = sm.OLS(y, X)results = model.fit()# Generate diagnostic plots for all variablesfig = plt.figure(figsize=(10, 8))fig = sm.graphics.plot_partregress_grid(results)plt.show()# Alternatively, generate a residual plot for the first independent variablefig = plt.figure(figsize=(10, 8))fig = sm.graphics.plot_regress_exog(results, 1)plt.show()
The output will be as follows:
Contribute to Docs
- Learn more about how to get involved.
- Edit this page on GitHub to fix an error or make an improvement.
- Submit feedback to let us know how we can improve Docs.
Learn Python on Codecademy
- Career path
Computer Science
Looking for an introduction to the theory behind programming? Master Python while learning data structures, algorithms, and more!Includes 6 CoursesWith Professional CertificationBeginner Friendly75 hours - Course
Learn Python 3
Learn the basics of Python 3.12, one of the most powerful, versatile, and in-demand programming languages today.With CertificateBeginner Friendly23 hours