We want our program to be able to iteratively *learn* what the best `m`

and `b`

values are. So for each `m`

and `b`

pair that we guess, we want to move them in the direction of the gradients we’ve calculated. But how far do we move in that direction?

We have to choose a **learning rate**, which will determine how far down the loss curve we go.

A small learning rate will take a long time to converge — you might run out of time or cycles before getting an answer. A large learning rate might skip over the best value. It might *never* converge! Oh no!

Finding the absolute best learning rate is not necessary for training a model. You just have to find a learning rate large enough that gradient descent converges with the efficiency you need, and not so large that convergence never happens.

### Instructions

**1.**

We have imported two new lists representing how the `b`

value changed with different learning rates:

`bs_000000001`

: 1400 iterations of gradient descent on`b`

with a learning rate of 0.000000001`bs_01`

: 100 iterations of gradient descent on`b`

with a learning rate of 0.01

Change the plot to plot `bs_000000001`

instead of `bs`

.

Does the gradient descent algorithm still converge to the same b value? Does it converge at all? Look at the values on the y-axis!

**2.**

Change the plot to plot `bs_01`

instead of `bs_000000001`

. Unfortunately, our computers blew up after 100 iterations of this, so you’ll also have to change the number of iterations to `100`

instead of `1400`

:

iterations = range(100)

Does the gradient descent algorithm still converge to the same b value? Does it converge at all?