Once we’ve calculated the fitted values and residuals for a model, we can check the normality and homoscedasticity assumptions of linear regression.
The normality assumption states that the residuals should be normally distributed. To check this assumption, we can inspect a histogram of the residuals and make sure that the distribution looks approximately normal (no skew or multiple “humps”):
These residuals appear normally distributed, leading us to conclude that the normality assumption is satisfied.
If the plot instead looked something like the distribution below (which is skewed right), we would be concerned that the normality assumption is not met:
Homoscedasticity is a fancy way of saying that the residuals have equal variation across all values of the predictor variable. A common way to check this is by plotting the residuals against the fitted values.
plt.scatter(fitted_values, residuals) plt.show()
If the homoscedasticity assumption is met, then this plot will look like a random splatter of points, centered around y=0 (as in the example above).
If there are any patterns or asymmetry, that would indicate the assumption is NOT met and linear regression may not be appropriate. For example:
Your code to calculate the residuals and fitted values for the model of score predicted by hours studied is provided for you in script.py. Plot a histogram of the residuals to check the normality assumption. Is this assumption met?
Now, check the homoscedasticity assumption by plotting the residuals against the fitted values (
fitted_values on the x-axis and
residuals on the y-axis). Is this assumption met?