Learn

We want our program to be able to iteratively learn what the best m and b values are. So for each m and b pair that we guess, we want to move them in the direction of the gradients we’ve calculated. But how far do we move in that direction?

We have to choose a learning rate, which will determine how far down the loss curve we go.

A small learning rate will take a long time to converge — you might run out of time or cycles before getting an answer. A large learning rate might skip over the best value. It might never converge! Oh no! Finding the absolute best learning rate is not necessary for training a model. You just have to find a learning rate large enough that gradient descent converges with the efficiency you need, and not so large that convergence never happens.

### Instructions

1.

We have imported two new lists representing how the b value changed with different learning rates:

• bs_000000001: 1400 iterations of gradient descent on b with a learning rate of 0.000000001
• bs_01: 100 iterations of gradient descent on b with a learning rate of 0.01

Change the plot to plot bs_000000001 instead of bs.

Does the gradient descent algorithm still converge to the same b value? Does it converge at all? Look at the values on the y-axis!

2.

Change the plot to plot bs_01 instead of bs_000000001. Unfortunately, our computers blew up after 100 iterations of this, so you’ll also have to change the number of iterations to 100 instead of 1400:

iterations = range(100)

Does the gradient descent algorithm still converge to the same b value? Does it converge at all?