Sometimes we use regression to understand the relationship between two variables because we wish to control for potential confounders. For example, based on the `survey`

dataset, we may be primarily interested in how studying (`hours_studied`

) is related to test score (`score`

); however, in order to understand this relationship, we may want to control for additional student attributes, such as whether the student ate breakfast (`breakfast`

).

If we perform a simple linear regression predicting `score`

from `hours_studied`

, we get the following results:

import statsmodels.api as sm model0 = sm.OLS.from_formula('score ~ hours_studied', data=survey).fit() print(model0.params) # Output: # Intercept 34.990700 # hours_studied 11.881045

However, if we add `breakfast`

to the model and inspect the new coefficients, we’ll find that the intercept and slope on `hours_studied`

have changed:

import statsmodels.api as sm model1 = sm.OLS.from_formula('score ~ hours_studied + breakfast', data=survey).fit() print(model1.params) # Output: # Intercept 32.665570 # hours_studied 8.540499 # breakfast 22.495615

Note that the coefficient on `hours_studied`

changes from 11.9 to 8.5. Why does this happen? Perhaps people who eat breakfast are more likely to study longer and also more likely to score better on their exam. Without taking `breakfast`

into account, some of the relationship between `score`

and `breakfast`

is attributed to `hours_studied`

instead.

### Instructions

**1.**

The `student`

dataset has been loaded for you in **script.py**. Run a regression model predicting final Portuguese score (`port3`

) from first math score (`math1`

). Save the fitted model as `simple`

.

**2.**

Now fit a multiple regression model predicting final Portuguese scores (`port3`

) from first math score (`math1`

) AND first Portuguese score (`port1`

). Save the fitted model as `multiple`

.

**3.**

Print the resulting intercept and coefficients from the `simple`

model using `.params`

. What is the apparent relationship between final Portuguese score and first math score?

**4.**

Print the resulting coefficients from the `multiple`

model using `.params`

. How did the coefficient on `math1`

change when `port1`

was added to the model?