Let’s return to a plot we saw in Exercise #1.

Scatter plot showing happy level on the y-axis against stress level on the x-axis. Points for the exercise group are given as orange crosses and those for the non-exercise group are given as blue circles. Two negatively sloped lines intersect starting at different intercepts: a solid orange line for the exercise group and a dotted blue line for the non-exercise group.

The data for this plot is from a fictional study on happiness that measures the following variables about its participants:

  • happy – their happiness level on a quantitative scale of 1 to 10
  • stress – their stress level on a quantitative scale of 1-10
  • exercise – whether they exercise regularly, where 1 = yes and 0 = no

We have drawn in a line estimating the relationship between stress and happiness for each exercise group. The line for the group that exercises appears flatter than that for the non-exercise group.

This indicates that exercise might modify the relationship between stress and happiness. Perhaps regular exercise buffers the effects of stress on happiness. Or perhaps people who exercise are also likely to do stress-reducing activities like meditation. While we don’t know the exact reason, we do see a potential difference when we examine the exercise groups separately.

If we fit a regression modeling happy from the quantitative predictor stress and the binary predictor exercise, we get the following results:

import statsmodels.api as sm model = sm.OLS.from_formula('happy ~ stress + exercise', data=happiness).fit() print(model.params) # Output: # Intercept 10.256296 # stress -0.707925 # exercise -0.894058

Using these coefficients, we can plot two lines with differing intercepts for each exercise group.

Scatter plot showing happy versus stress with two parallel lines: A lower one for the exercise group and a higher one for the non-exercise group.

Our lines have different intercepts, but seem to be missing the steeper slope of the points for the non-exercise group. Since a model for happy with just stress and exercise as predictors only allows for the intercepts to differ, we must add an interaction term to our model to capture the difference in slopes.



The plants dataset has been loaded for you in script.py. Fit a multiple regression predicting height with weight and species as predictors and save the results as model.


Print the intercept and coefficients of model. What do the coefficients tell us about the relationships between the variables?


Remove the # symbols to uncomment the code and run the plot of height and weight colored by species. Does the coefficient on weight seem to describe the slope for both species of plant?

Take this course for free

Mini Info Outline Icon
By signing up for Codecademy, you agree to Codecademy's Terms of Service & Privacy Policy.

Or sign up using:

Already have an account?