In the previous exercises, we looked at regression models with one quantitative predictor and one binary predictor, but we can also have models with multiple quantitative predictors. For example, consider the following model using the
survey dataset (
assignments is the number of homework assignments the student has completed):
import statsmodels.api as sm model = sm.OLS.from_formula('score ~ hours_studied + assignments', data=survey).fit() print(model.params) # Output: # Intercept 16.676498 # hours_studied 6.273886 # assignments 4.687796
From the coefficients above, our regression equation is:
We can still think of multiple regression as creating a new regression line for each value of a quantitative predictor. However, it is challenging to visualize this because we now have different regression lines for every possible value of
assignments. To visualize the regression output, it is helpful to choose a few sample values: for example, 1, 5, and 10 assignments.
We can add these lines to our scatter plot of
hours_studied as before:
import seaborn as sns import matplotlib.pyplot as plt # Create scatter plot of hours_studied and score sns.lmplot(x='hours_studied', y='score', hue='assignments', palette='Blues', fit_reg=False, data=survey)
This time we will directly put the model coefficients into each regression equation by calling them individually from
model.params. The code for 1, 5, and 10 assignments is given below.
# Add regression line for 1 assignment plt.plot(survey.hours_studied, model.params + model.params*survey.hours_studied + model.params*1, color='lightblue',linewidth=5) # Add regression line for 5 assignments plt.plot(survey.hours_studied, model.params + model.params*survey.hours_studied + model.params*5, color='blue',linewidth=5) # Add regression line for 10 assignments plt.plot(survey.hours_studied, model.params + model.params*survey.hours_studied + model.params*10, color='darkblue',linewidth=5) # Show plot with legend plt.legend(['assignments=1','assignments=5', 'assignments=10']) plt.show()
We can see in the plot that the slopes of all three lines are the same, but the intercepts differ. As the number of completed assignments increases, the intercept of the corresponding regression line also increases.
In script.py we’ve fit a regression predicting
port3 based on
port1 and saved the fitted model as
model2.params and inspect the coefficients. What is the relationship between final Portuguese score (
port3) and first Portuguese score (
We’ve already provided you with code to plot two regression lines for students with first semester Portuguese scores (
port1) of 4 and 6. Using the results from the regression model, write a line of code to add a third regression line to the plot for
port1 = 8. Make the color of the line
darkblue. How do the lines on the plot match with the results from the model?