Learn

In the previous exercises, we looked at regression models with one quantitative predictor and one binary predictor, but we can also have models with multiple quantitative predictors. For example, consider the following model using the survey dataset (assignments is the number of homework assignments the student has completed):

import statsmodels.api as sm model = sm.OLS.from_formula('score ~ hours_studied + assignments', data=survey).fit() print(model.params) # Output: # Intercept 16.676498 # hours_studied 6.273886 # assignments 4.687796

From the coefficients above, our regression equation is:

score=16.7+6.3hours_studied+4.7assignments\text{score} = 16.7 + 6.3*\text{hours\_studied} + 4.7*\text{assignments}

We can still think of multiple regression as creating a new regression line for each value of a quantitative predictor. However, it is challenging to visualize this because we now have different regression lines for every possible value of assignments. To visualize the regression output, it is helpful to choose a few sample values: for example, 1, 5, and 10 assignments.

We can add these lines to our scatter plot of score vs. hours_studied as before:

import seaborn as sns import matplotlib.pyplot as plt # Create scatter plot of hours_studied and score sns.lmplot(x='hours_studied', y='score', hue='assignments', palette='Blues', fit_reg=False, data=survey)

This time we will directly put the model coefficients into each regression equation by calling them individually from model.params. The code for 1, 5, and 10 assignments is given below.

# Add regression line for 1 assignment plt.plot(survey.hours_studied, model.params[0] + model.params[1]*survey.hours_studied + model.params[2]*1, color='lightblue',linewidth=5) # Add regression line for 5 assignments plt.plot(survey.hours_studied, model.params[0] + model.params[1]*survey.hours_studied + model.params[2]*5, color='blue',linewidth=5) # Add regression line for 10 assignments plt.plot(survey.hours_studied, model.params[0] + model.params[1]*survey.hours_studied + model.params[2]*10, color='darkblue',linewidth=5) # Show plot with legend plt.legend(['assignments=1','assignments=5', 'assignments=10']) plt.show()

Scatter plot showing hours studied on the x-axis and score on the y-axis. Three parallel lines each show a positive relationship between score and hours studied for 1, 5, and 10 assignments. The intercepts of the lines start higher as the number of assignments increases.

We can see in the plot that the slopes of all three lines are the same, but the intercepts differ. As the number of completed assignments increases, the intercept of the corresponding regression line also increases.

Instructions

1.

In script.py we’ve fit a regression predicting port3 based on math1 and port1 and saved the fitted model as model2. Print model2.params and inspect the coefficients. What is the relationship between final Portuguese score (port3) and first Portuguese score (port1)?

2.

We’ve already provided you with code to plot two regression lines for students with first semester Portuguese scores (port1) of 4 and 6. Using the results from the regression model, write a line of code to add a third regression line to the plot for port1 = 8. Make the color of the line darkblue. How do the lines on the plot match with the results from the model?

Take this course for free

By signing up for Codecademy, you agree to Codecademy's Terms of Service & Privacy Policy.
Already have an account?