Linear regression is a powerful modeling technique that can be used to understand the relationship between a quantitative variable and one or more other variables, sometimes with the goal of making predictions. For example, linear regression can help us answer questions like:
- What is the relationship between apartment size and rental price for NYC apartments?
- Is a mother’s height a good predictor of their child’s adult height?
The first step before fitting a linear regression model is exploratory data analysis and data visualization: is there a relationship that we can model? For example, suppose we collect heights (in cm) and weights (in kg) for 9 adults and inspect a plot of height vs. weight:
plt.scatter(data.height, data.weight) plt.xlabel('height (cm)') plt.ylabel('weight (kg)') plt.show()
When we look at this plot, we see that there is some evidence of a relationship between height and weight: people who are taller tend to weigh more. In the following exercises, we’ll learn how to model this relationship with a line. If you were to draw a line through these points to describe the relationship between height and weight, what line would you draw?
Instructions
A dataset has been loaded for you in script.py containing fictional data from a group of students who were surveyed about their studying and breakfast choices prior to a math test. The data is loaded as a variable named students
.
Create a scatter plot with hours_studied
on the x-axis and score
on the y-axis.
Note that the code to show the plot (plt.show()
) is already provided for you, so you do not need to add it!
If you had to draw a line on top of this plot to describe the relationship between hours studied and math score, what would that line look like?
Uncomment the code for the line plot (plt.plot(students.hours_studied, y)
). Does this line look correct?