Let’s return to a plot we saw in Exercise #1.

The data for this plot is from a fictional study on happiness that measures the following variables about its participants:

`happy`

– their happiness level on a quantitative scale of 1 to 10`stress`

– their stress level on a quantitative scale of 1-10`exercise`

– whether they exercise regularly, where`1 = yes`

and`0 = no`

We have drawn in a line estimating the relationship between stress and happiness for each exercise group. The line for the group that exercises appears flatter than that for the non-exercise group.

This indicates that `exercise`

might modify the relationship between stress and happiness. Perhaps regular exercise buffers the effects of stress on happiness. Or perhaps people who exercise are also likely to do stress-reducing activities like meditation. While we don’t know the exact reason, we do see a potential difference when we examine the exercise groups separately.

If we fit a regression modeling `happy`

from the quantitative predictor `stress`

and the binary predictor `exercise`

, we get the following results:

import statsmodels.api as sm model = sm.OLS.from_formula('happy ~ stress + exercise', data=happiness).fit() print(model.params) # Output: # Intercept 10.256296 # stress -0.707925 # exercise -0.894058

Using these coefficients, we can plot two lines with differing intercepts for each exercise group.

Our lines have different intercepts, but seem to be missing the steeper slope of the points for the non-exercise group. Since a model for `happy`

with just `stress`

and `exercise`

as predictors only allows for the intercepts to differ, we must add an interaction term to our model to capture the difference in slopes.

### Instructions

**1.**

The `plants`

dataset has been loaded for you in **script.py**. Fit a multiple regression predicting `height`

with `weight`

and `species`

as predictors and save the results as `model`

.

**2.**

Print the intercept and coefficients of `model`

. What do the coefficients tell us about the relationships between the variables?

**3.**

Remove the `#`

symbols to uncomment the code and run the plot of `height`

and `weight`

colored by `species`

. Does the coefficient on `weight`

seem to describe the slope for both species of plant?