Linear regression is a powerful modeling technique that can be used to understand the relationship between a quantitative variable and one or more other variables, sometimes with the goal of making predictions. For example, linear regression can help us answer questions like:
- What is the relationship between apartment size and rental price for NYC apartments?
- Is a mother’s height a good predictor of their child’s adult height?
The first step before fitting a linear regression model is exploratory data analysis and data visualization: is there a relationship that we can model? For example, suppose we collect heights (in cm) and weights (in kg) for 9 adults and inspect a plot of height vs. weight:
plt.scatter(data.height, data.weight) plt.xlabel('height (cm)') plt.ylabel('weight (kg)') plt.show()
When we look at this plot, we see that there is some evidence of a relationship between height and weight: people who are taller tend to weigh more. In the following exercises, we’ll learn how to model this relationship with a line. If you were to draw a line through these points to describe the relationship between height and weight, what line would you draw?
A dataset has been loaded for you in script.py containing fictional data from a group of students who were surveyed about their studying and breakfast choices prior to a math test. The data is loaded as a variable named
Create a scatter plot with
hours_studied on the x-axis and
score on the y-axis.
Note that the code to show the plot (
plt.show()) is already provided for you, so you do not need to add it!
If you had to draw a line on top of this plot to describe the relationship between hours studied and math score, what would that line look like?
Uncomment the code for the line plot (
plt.plot(students.hours_studied, y)). Does this line look correct?