Learn

Building on the last exercise, let’s run a model predicting happy from stress and freetime with an interaction term for the predictors.

import statsmodels.api as sm
modelQ = sm.OLS.from_formula('happy ~ stress + freetime + stress:freetime', data=happiness).fit()
print(modelQ.params)

# Output:
# Intercept          7.731785
# stress            -0.551098
# freetime           0.187882
# stress:freetime    0.040401

We form the regression equation from the coefficients just as we did for an interaction term with a binary variable.

$\text{happy} = 7.73 - 0.55*\text{stress} + 0.19*\textbf{freetime} + 0.04*\text{stress}*\textbf{freetime}$

We can write a new equation for participants with differing amounts of daily free time.

For participants with 0 hours of free time, the equation is:

\begin{aligned} \text{happy} = 7.73 - 0.55*\text{stress} + 0.19*\bm{0} + 0.04*\text{stress}*\bm{0} \\ \text{happy} = 7.73 - 0.55*\text{stress} + 0 + 0 \\ \text{happy} = 7.73 - 0.55*\text{stress} \\ \end{aligned}

For participants with 1 hour of free time, the equation is:

\begin{aligned} \text{happy} = 7.73 - 0.55*\text{stress} + 0.19*\bm{1} + 0.04*\text{stress}*\bm{1} \\ \text{happy} = 7.73 - 0.55*\text{stress} + 0.19 + 0.04*\text{stress} \\ \text{happy} = (7.73+0.19) + (- 0.55 + 0.04)*\text{stress} \\ \end{aligned}

When we simplify and combine terms, we see the intercept increases by 0.19 and the slope increases by 0.04 compared to the participants with 0 hours of free time. An additional 0.19 and 0.04 get added to the intercept and slope, respectively, when we increase freetime to 2 hours:

\begin{aligned} \text{happy} = 7.73 - 0.55*\text{stress} + 0.19*\bm{2} + 0.04*\text{stress}*\bm{2} \\ \text{happy} = 7.73 - 0.55*\text{stress} + 0.19*2 + 0.04*\text{stress}*2 \\ \text{happy} = (7.73+0.19*2) + (- 0.55 + 0.04*2)*\text{stress} \\ \end{aligned}

The 0.19 that gets added to the intercept with each increase in freetime is the coefficient on freetime from our regression. The 0.04 that gets added to the slope of stress with each increase in freetime is the coefficient on stress:freetime from our regression.

### Instructions

1.

The plants dataset has been loaded for you in script.py. Fit a model predicting growth with water, fertilizer, and an interaction between them as predictors, and save this model as model3.

2.

Print the intercept and coefficients from model3. Are they what you expected?

3.

Save the value of the coefficient that represents the difference in the slope on water as fertilizer increases by one as slopeDiff. What do we learn about the relationship between plant growth and number of times watered as amount of fertilizer increases?

4.

Using the model coefficients, write the regression equation for when fertilizer = 3. In script.py, save the intercept of the equation as intercept3 and the coefficient on water as slope3. Do the same for when fertilizer = 5, and save the intercept as intercept5 and the coefficient on water as slope5. Why are these slopes farther apart than 0.774034?