Learn
Introduction to Linear Regression
Introduction to Linear Regression

Linear regression is a powerful modeling technique that can be used to understand the relationship between a quantitative variable and one or more other variables, sometimes with the goal of making predictions. For example, linear regression can help us answer questions like:

• What is the relationship between apartment size and rental price for NYC apartments?
• Is a mother’s height a good predictor of their child’s adult height?

The first step before fitting a linear regression model is exploratory data analysis and data visualization: is there a relationship that we can model? For example, suppose we collect heights (in cm) and weights (in kg) for 9 adults and inspect a plot of height vs. weight:

``````.gamut-6x0oro-ColorizedContainer{display:block;text-align:left;font-weight:normal;background-color:#211E2F;color:#939598;font-family:Monaco,Menlo,"Ubuntu Mono","Droid Sans Mono",Consolas,monospace;font-size:0.875rem;padding:1rem;overflow-wrap:break-word;white-space:pre-wrap;-webkit-font-smoothing:antialiased;-moz-osx-font-smoothing:grayscale;}plt.scatter(data.height, data.weight)
plt.xlabel('height (cm)')
plt.ylabel('weight (kg)')
plt.show()`````` When we look at this plot, we see that there is some evidence of a relationship between height and weight: people who are taller tend to weigh more. In the following exercises, we’ll learn how to model this relationship with a line. If you were to draw a line through these points to describe the relationship between height and weight, what line would you draw?

### Instructions

1.

A dataset has been loaded for you in script.py containing fictional data from a group of students who were surveyed about their studying and breakfast choices prior to a math test. The data is loaded as a variable named `students`.

Create a scatter plot with `hours_studied` on the x-axis and `score` on the y-axis.

Note that the code to show the plot (`plt.show()`) is already provided for you, so you do not need to add it!

2.

If you had to draw a line on top of this plot to describe the relationship between hours studied and math score, what would that line look like?

Uncomment the code for the line plot (`plt.plot(students.hours_studied, y)`). Does this line look correct?