To run a multiple linear regression in Python, we can use the function
statsmodels.api. For example, if we want to run a regression to predict
breakfast (contained in a dataset named
survey), we can fit the model as follows:
import statsmodels.api as sm model = sm.OLS.from_formula('score ~ hours_studied + breakfast', data=survey).fit()
To actually view the results, we can print a summary of them to the console using the following code.
Rather than printing the entire summary table, we can call the model coefficients directly using
model.params. We can even call a specific coefficient by order of appearance in the table. For instance:
print(model.params) # Output: # Intercept 32.665570 # hours_studied 8.540499 # breakfast 22.495615 print(model.params) # Output: # 32.66556979549575
From the coefficient table, we can see the intercept is approximately 32.7, the coefficient on
hours_studied is 8.5, and the coefficient on
breakfast is 22.5.
student dataset, fit a multiple regression model for the response variable
port3 with quantitative predictor
math1 and binary predictor
address. Save the results as
Print the intercept and coefficients from
.params. Are they listed in the order you thought they’d be?
model1.params, save the intercept as
b0, the coefficient for
b1, and the coefficient for
b2. If we added students’ first semester Portuguese score (
port1) as another predictor to the model, what index would it be in