Like the name implies, LINEar regression involves fitting a line to a set of data points. In order to fit a line, it’s helpful to understand the equation for a line, which is often written as y=mx+b. In this equation:
- x and y represent variables, such as height and weight or hours of studying and quiz scores.
- b represents the y-intercept of the line. This is where the line intersects with the y-axis (a vertical line located at x = 0).
- m represents the slope. This controls how steep the line is. If we choose any two points on a line, the slope is the ratio between the vertical and horizontal distance between those points; this is often written as rise/run.
The following plot shows a line with the equation y = 2x + 12:
Note that we can also have a line with a negative slope. For example, the following plot shows the line with the equation y = -2x + 8:
Instructions
In script.py, we’ve again plotted score
(as the y-variable) against hours_studied
(the x-variable), with a line going through the points. Let’s see if we can improve this line so that it better fits the data. To start, the line appears to be too steep. In script.py, edit the equation of the line so that the slope is 10
, then press “Run” to see the new line.
This should make the line less steep (because we are decreasing the slope). Does this fit the data better or worse?
The line now appears to be parallel to the points but still sits below them! Leaving the slope of the line equal to 10
, edit the equation of the line so that the y-intercept is 45
, then press “Run” to see the new line.
This should move the line upward (because we are increasing the y-intercept). Does this new line fit the data well?