Learn

Congratulations! In this lesson you’ve learned to:

- Fit a multiple linear regression model in Python
- Write and interpret a multiple regression model
- Understand what binary and quantitative predictor coefficients mean visually and in context
- Check the assumption that multicollinearity isn’t present

### Instructions

A new dataset has been loaded for you called `family`

. The data is modified from the Family Income and Expenditure Survey (FIES) of the Philippine Statistics Authority (PSA), a survey taken every three years on family income and expenditure in the Philippines. We’ll work with the following variables from this dataset:

- income (
`income`

) - total food expenditure (
`food`

) - total housing and water expenditure (
`housing`

) - source of income (
`source`

).

The income and expenditure variables are measured in thousands of Philippine pesos. Try practicing multiple regression in **script.py** using the following instructions. Sample solutions are provided in **solutions.py**.

- Create a heat map of the quantitative variables in the
`family`

dataset. Do any pairs have high correlations? - Fit a model for
`income`

using`food`

,`housing`

, and`source`

as predictors and inspect a summary of the results. The binary variable`source`

has values`Entrepreneurial Activities`

and`Wage/Salaries`

. According to the summary, which value of`source`

is coded as`1`

and which is coded as`0`

? - Write out the regression equation from the coefficients. Did you remember that you can print just the coefficients using
`.params`

? - Interpret the intercept of the equation. Is this interpretation practical?
- Interpret the coefficient on the variable
`source`

in terms of expected income. How is the intercept different between groups? - Interpret the coefficient on
`food`

. Is there an increase or decrease in income associated with an increase in food expenditure? - Interpret the coefficient on
`housing`

. Is there an increase or decrease in income associated with an increase in housing expenditure? - Create a scatter plot of
`housing`

on the x-axis and`income`

on the y-axis, colored by`source`

. Looking only at the`Wage/Salaries`

group, use the regression equation for step 3 to add three lines to the plot for when food expenditure is 10,000, 100,000, and 200,000 pesos, giving each line a different color. Remember that`food`

is measured in thousands of pesos, so 10,000 pesos is`food = 10`

. Why did we have to look at only one value of`source`

to produce these lines?

# Take this course for free

By signing up for Codecademy, you agree to Codecademy's Terms of Service & Privacy Policy.