Learn

To run a multiple linear regression in Python, we can use the function OLS.from_formula() from statsmodels.api. For example, if we want to run a regression to predict score using hours_studied and breakfast (contained in a dataset named survey), we can fit the model as follows:

import statsmodels.api as sm model = sm.OLS.from_formula('score ~ hours_studied + breakfast', data=survey).fit()

To actually view the results, we can print a summary of them to the console using the following code.

print(model.summary())

Rather than printing the entire summary table, we can call the model coefficients directly using model.params. We can even call a specific coefficient by order of appearance in the table. For instance:

print(model.params) # Output: # Intercept 32.665570 # hours_studied 8.540499 # breakfast 22.495615 print(model.params[0]) # Output: # 32.66556979549575

From the coefficient table, we can see the intercept is approximately 32.7, the coefficient on hours_studied is 8.5, and the coefficient on breakfast is 22.5.

Instructions

1.

Using the student dataset, fit a multiple regression model for the response variable port3 with quantitative predictor math1 and binary predictor address. Save the results as model1.

2.

Print the intercept and coefficients from model1 using .params. Are they listed in the order you thought they’d be?

3.

Using model1.params, save the intercept as b0, the coefficient for math1 as b1, and the coefficient for address as b2. If we added students’ first semester Portuguese score (port1) as another predictor to the model, what index would it be in model1.params?

Take this course for free

By signing up for Codecademy, you agree to Codecademy's Terms of Service & Privacy Policy.
Already have an account?