When we think about how we can assign a slope and intercept to fit a set of points, we have to define what the best fit is.
For each data point, we calculate loss, a number that measures how bad the model’s (in this case, the line’s) prediction was. You may have seen this being referred to as error.
We can think about loss as the squared distance from the point to the line. We do the squared distance (instead of just the distance) so that points above and below the line both contribute to total loss in the same way:
In this example:
- For point A, the squared distance is
9
(3²) - For point B, the squared distance is
1
(1²)
So the total loss, with this model, is 10
. If we found a line that had less loss than 10
, that line would be a better model for this data.
Instructions
We have three points, (1, 5), (2, 1), and (3, 3). We are trying to find a line that produces lowest loss.
We have provided you the list of x-values, x
, and y-values, y
, for these points.
Find the y-values that the line with weights m1
and b1
would predict for the x-values given. Store these in a list called y_predicted1
.
Find the y values that the line with weights m2
and b2
would predict for the x-values given. Store these in a list called y_predicted2
.
Create a variable called total_loss1
and set it equal to zero.
Then, find the sum of the squared distance between the actual y-values of the points and the y_predicted1
values by looping through the list:
- Calculating the difference between
y
andy_predicted1
- Squaring the difference
- Adding it to
total_loss1
Create a variable called total_loss2
and set it equal to zero.
Find the sum of the squared distance between the actual y-values of the points and the y_predicted2
values by looping through the list:
- Calculating the difference between
y
andy_predicted2
- Squaring the difference
- Adding it to
total_loss2
Print out total_loss1
and total_loss2
. Out of these two lines, which would you use to model the points?
Create a variable called better_fit
and assign it to 1
if line 1 fits the data better and 2
if line 2 fits the data better.