We have seen how to implement a linear regression algorithm in Python, and how to use the linear regression model from scikit-learn. We learned:

  • We can measure how well a line fits by measuring loss.
  • The goal of linear regression is to minimize loss.
  • To find the line of best fit, we try to find the b value (intercept) and the m value (slope) that minimize loss.
  • Convergence refers to when the parameters stop changing with each iteration.
  • Learning rate refers to how much the parameters are changed on each iteration.
  • We can use Scikit-learn’s LinearRegression() model to perform linear regression on a set of points.

These are important tools to have in your toolkit as you continue your exploration of data science.


Find another dataset, maybe in scikit-learn’s example datasets. Or on Kaggle, a great resource for tons of interesting data.

Try to perform linear regression on your own! If you find any cool linear correlations, make sure to share them!

As a starter, we’ve loaded in the Boston housing dataset. We made the X values the nitrogen oxides concentration (parts per 10 million), and the y values the housing prices. See if you can perform regression on these houses!

Sign up to start coding

By signing up for Codecademy, you agree to Codecademy's Terms of Service & Privacy Policy.
Already have an account?