While we can view a binary categorical variable as a way of creating two new regression equations with different intercepts, we don’t need to make these equations every time we want to interpret a binary predictor in a multiple regression equation.

In the `survey`

dataset, `breakfast`

is a binary variable that is equal to `1`

for students who ate breakfast on test day and `0`

for those who didn’t. For predicting `score`

based on `hours_studied`

and `breakfast`

, the multiple regression equation is:

`$\text{score} = 32.7 + 8.5*\text{hours\_studied} + 22.5*\text{breakfast}$`

Take a look at the scatter plot with regression lines on top:

We can interpret the regression coefficients as follows:

The

`breakfast`

variable has a coefficient of 22.5. The interpretation is: holding all other variables constant, students who ate breakfast scored 22.5 points higher than students who did not. “Holding all other variables constant” means that we’re comparing breakfast groups among students who studied the same number of hours. Visually, this means that the distance between the two regression lines is always 22.5 for any value of`hours_studied`

(the dotted lines in the picture above are all the same length).The intercept (32.7) is the average value of the response variable when all predictors in the equation are equal to 0. According to our full regression equation, this means that students who didn’t study (

`hours_studied = 0`

) and didn’t eat breakfast (`breakfast = 0`

) earned an average score of 32.7 (the y-intercept for the blue line).

### Instructions

**1.**

Suppose that we fit a model to predict `port3`

(final Portuguese score) with predictors `math1`

(first semester math score) and `address`

(urban or rural residence). The coefficients are printed below.

# Output: # Intercept 3.234071 # address[T.U] 0.557631 # math1 0.475892

In the file **interpretations.txt** write a one-sentence interpretation for the intercept. Does this interpretation make practical sense?

**2.**

Add a one-sentence interpretation to **interpretations.txt** for the coefficient on `address`

in terms of the average Portuguese scores (`port3`

) of students from rural areas (`R`

or `address = 0`

) and students from urban areas (`U`

or `address = 1`

). Check your solution against the sample solutions in **solutions.txt**.