Learn

In the previous exercises, we looked at regression models with one quantitative predictor and one binary predictor, but we can also have models with multiple quantitative predictors. For example, consider the following model using the survey dataset (assignments is the number of homework assignments the student has completed):

import statsmodels.api as sm
model = sm.OLS.from_formula('score ~ hours_studied + assignments', data=survey).fit()
print(model.params)

# Output:
# Intercept        16.676498
# hours_studied     6.273886
# assignments       4.687796

From the coefficients above, our regression equation is:

$\text{score} = 16.7 + 6.3*\text{hours\_studied} + 4.7*\text{assignments}$

We can still think of multiple regression as creating a new regression line for each value of a quantitative predictor. However, it is challenging to visualize this because we now have different regression lines for every possible value of assignments. To visualize the regression output, it is helpful to choose a few sample values: for example, 1, 5, and 10 assignments.

We can add these lines to our scatter plot of score vs. hours_studied as before:

import seaborn as sns
import matplotlib.pyplot as plt

# Create scatter plot of hours_studied and score
sns.lmplot(x='hours_studied', y='score', hue='assignments', palette='Blues', fit_reg=False, data=survey)

This time we will directly put the model coefficients into each regression equation by calling them individually from model.params. The code for 1, 5, and 10 assignments is given below.

# Add regression line for 1 assignment
plt.plot(survey.hours_studied, model.params[0] + model.params[1]*survey.hours_studied + model.params[2]*1, color='lightblue',linewidth=5)

# Add regression line for 5 assignments
plt.plot(survey.hours_studied, model.params[0] + model.params[1]*survey.hours_studied + model.params[2]*5, color='blue',linewidth=5)

# Add regression line for 10 assignments
plt.plot(survey.hours_studied, model.params[0] + model.params[1]*survey.hours_studied + model.params[2]*10, color='darkblue',linewidth=5)

# Show plot with legend
plt.legend(['assignments=1','assignments=5', 'assignments=10'])
plt.show()

We can see in the plot that the slopes of all three lines are the same, but the intercepts differ. As the number of completed assignments increases, the intercept of the corresponding regression line also increases.

### Instructions

1.

In script.py we’ve fit a regression predicting port3 based on math1 and port1 and saved the fitted model as model2. Print model2.params and inspect the coefficients. What is the relationship between final Portuguese score (port3) and first Portuguese score (port1)?

2.

We’ve already provided you with code to plot two regression lines for students with first semester Portuguese scores (port1) of 4 and 6. Using the results from the regression model, write a line of code to add a third regression line to the plot for port1 = 8. Make the color of the line darkblue. How do the lines on the plot match with the results from the model?