Learn
Linear Regression
Loss

When we think about how we can assign a slope and intercept to fit a set of points, we have to define what the best fit is.

For each data point, we calculate loss, a number that measures how bad the model’s (in this case, the line’s) prediction was. You may have seen this being referred to as error.

We can think about loss as the squared distance from the point to the line. We do the squared distance (instead of just the distance) so that points above and below the line both contribute to total loss in the same way:

In this example:

  • For point A, the squared distance is 9 (3²)
  • For point B, the squared distance is 1 (1²)

So the total loss, with this model, is 10. If we found a line that had less loss than 10, that line would be a better model for this data.

Instructions

1.

We have three points, (1, 5), (2, 1), and (3, 3). We are trying to find a line that produces lowest loss.

We have provided you the list of x-values, x, and y-values, y, for these points.

Find the y-values that the line with weights m1 and b1 would predict for the x-values given. Store these in a list called y_predicted1.

2.

Find the y values that the line with weights m2 and b2 would predict for the x-values given. Store these in a list called y_predicted2.

3.

Create a variable called total_loss1 and set it equal to zero.

Then, find the sum of the squared distance between the actual y-values of the points and the y_predicted1 values by looping through the list:

  • Calculating the difference between y and y_predicted1
  • Squaring the difference
  • Adding it to total_loss1
4.

Create a variable called total_loss2 and set it equal to zero.

Find the sum of the squared distance between the actual y-values of the points and the y_predicted2 values by looping through the list:

  • Calculating the difference between y and y_predicted2
  • Squaring the difference
  • Adding it to total_loss2
5.

Print out total_loss1 and total_loss2. Out of these two lines, which would you use to model the points?

Create a variable called better_fit and assign it to 1 if line 1 fits the data better and 2 if line 2 fits the data better.

Folder Icon

Sign up to start coding

Already have an account?