At the beginning of this lesson, we mentioned that the relationship between variables often does not appear to be a straight line and may instead be somewhat curved. In this case, we may produce a better fitting regression model by adding a polynomial term that raises our predictor variable to a higher exponent to better account for the curving.

In the Python library `statsmodels.api`

, polynomial terms can be added to a multiple regression model formula by adding a term with the predictor of interest raised to a higher power. This may be done using the NumPy function `np.power`

and specifying the predictor name and degree.

For example, the plot above suggests a curvilinear relationship between happiness (`happy`

) and hours slept on average (`sleep`

). We can add a quadratic (squared) term for the variable `sleep`

by doing the following:

import statsmodels.api as sm import numpy as np modelP = sm.OLS.from_formula('happy ~ sleep + np.power(sleep,2)', data=happiness).fit() print(modelP.params) # Output: # Intercept -0.058995 # sleep 1.320429 # np.power(sleep, 2) -0.061827

This creates a second predictor term in the model with an additional coefficient. Correspondingly, the new term shows up in our model equation as `sleep`

squared.

*happy = -.06 + 1.32*sleep - .06*sleep ^{2}*

We can check happiness scores by substituting in different values of sleep.

For 2 hours of sleep:

*happy = -.06 + 1.32*2 - .06*2*^{2}= 2.34For 10 hours of sleep:

*happy = -.06 + 1.32*10 - .06*10*^{2}= 7.14For 14 hours of sleep:

*happy = -.06 + 1.32*14 - .06*14*^{2}= 6.66

Our curved model picks up on the pattern that beyond about 10 hours of sleep, more sleep is not associated with greater happiness. Perhaps the people who sleep a lot are ill so they need more sleep and are less happy, or perhaps sleeping too much causes problems in other parts of their lives that takes away from their happiness. We can’t know the exact cause from our model, just the association.

Note that we can use Python and our model results to perform the above computations for us. Below is code to compute `happy`

when `sleep`

is 10 and 14. Since we use exact rather than rounded numbers here, the results are slightly different but more accurate than our original work.

print(modelP.params[0] + modelP.params[1]*10 + modelP.params[2]*np.power(10,2)) # Output: # 6.9626120786831445 print(modelP.params[0] + modelP.params[1]*14 + modelP.params[2]*np.power(14,2)) # Output: # 6.308953414816589

### Instructions

**1.**

The `plants`

dataset has been loaded for you in **script.py**. Run a regression predicting number of dead leaves (`dead`

) from amount of light (`light`

) with an additional squared term for `light`

. Save the results as `model`

.

**2.**

Print the coefficients of `model`

. Are you surprised that the terms with `light`

in them have differing signs?

**3.**

Use the model coefficients to find the predicted number of dead leaves for `light`

= 10. Instead of using rounded versions of the coefficients from the model output, call the coefficients directly using the index numbers of `model.params`

and save this computation in **script.py** as `numDead10`

. Repeat this process for when `light`

= 18 and save the computation as `numDead18`

.

Here is the code to do this for when `light`

= 5 and save the computation as `numDead5`

:

numDead5 = model.params[0] + model.params[1]*5 + model.params[2]*np.power(5,2)

**4.**

We can print these values to inspect the pattern of dead leaves as light increases. Here’s the code to print `numDead5`

:

print(numDead5)

Print `numDead10`

and `numDead18`

. Based on these three predicted values of `dead`

, what does the pattern of outcomes say about what might be the best amount of light to keep leaves from dying?