Learn

Linear Regression

Gradient Descent for Intercept

As we try to minimize loss, we take each parameter we are changing, and move it as long as we are decreasing loss. It’s like we are moving down a hill, and stop once we reach the bottom:

The process by which we do this is called **gradient descent**. We move in the direction that decreases our loss the most. *Gradient* refers to the slope of the curve at any point.

For example, let’s say we are trying to find the intercept for a line. We currently have a guess of `10`

for the intercept. At the point of `10`

on the curve, the slope is downward. Therefore, if we increase the intercept, we should be lowering the loss. So we follow the gradient downwards.

We derive these gradients using calculus. It is not crucial to understand how we arrive at the gradient equation. To find the gradient of loss as intercept changes, the formula comes out to be:

`$\frac{2}{N}\sum_{i=1}^{N}-(y_i-(mx_i+b))$`

`N`

is the number of points we have in our dataset`m`

is the current gradient guess`b`

is the current intercept guess

Basically:

- we find the sum of
`y_value - (m*x_value + b)`

for all the`y_value`

s and`x_value`

s we have - and then we multiply the sum by a factor of
`-2/N`

.`N`

is the number of points we have.

Define a function called `get_gradient_at_b()`

that takes in a set of x values, `x`

, a set of y values, `y`

, a slope `m`

, and an intercept value `b`

.

For now, have it return `b`

, unchanged.

In the `get_gradient_at_b()`

function, we want to go through all of the `x`

values and all of the `y`

values and compute `(y - (m*x+b))`

for each of them.

Create a variable called `diff`

that has the sum of all of these values.

Instead of returning `b`

from the `get_gradient_at_b()`

function, return `diff`

.

Still in the `get_gradient_at_b()`

function, define a variable called `b_gradient`

and set it equal to the `-2/N`

multiplied by `diff`

.

**Note:** `N`

is the number of points, i.e. the length of the `x`

list or the `y`

list.

Instead of returning `diff`

, return `b_gradient`

.