As we try to minimize loss, we take each parameter we are changing, and move it as long as we are decreasing loss. It’s like we are moving down a hill, and stop once we reach the bottom:
The process by which we do this is called gradient descent. We move in the direction that decreases our loss the most. Gradient refers to the slope of the curve at any point.
For example, let’s say we are trying to find the intercept for a line. We currently have a guess of 10
for the intercept. At the point of 10
on the curve, the slope is downward. Therefore, if we increase the intercept, we should be lowering the loss. So we follow the gradient downwards.

We derive these gradients using calculus. It is not crucial to understand how we arrive at the gradient equation. To find the gradient of loss as intercept changes, the formula comes out to be:
N
is the number of points we have in our datasetm
is the current gradient guessb
is the current intercept guess
Basically:
- we find the sum of
y_value - (m*x_value + b)
for all they_value
s andx_value
s we have - and then we multiply the sum by a factor of
-2/N
.N
is the number of points we have.
Instructions
Define a function called get_gradient_at_b()
that takes in a set of x values, x
, a set of y values, y
, a slope m
, and an intercept value b
.
For now, have it return b
, unchanged.
In the get_gradient_at_b()
function, we want to go through all of the x
values and all of the y
values and compute (y - (m*x+b))
for each of them.
Create a variable called diff
that has the sum of all of these values.
Instead of returning b
from the get_gradient_at_b()
function, return diff
.
Still in the get_gradient_at_b()
function, define a variable called b_gradient
and set it equal to the -2/N
multiplied by diff
.
Note: N
is the number of points, i.e. the length of the x
list or the y
list.
Instead of returning diff
, return b_gradient
.