Log in from a computer to take this course

You'll need to log in from a computer to start Linear Regression in Python. But you can practice or keep up your coding streak with the Codecademy Go app. Download the app to get started.

apple storegoogle store

Linear regression is a powerful modeling technique that can be used to understand the relationship between a quantitative variable and one or more other variables, sometimes with the goal of making predictions. For example, linear regression can help us answer questions like:

  • What is the relationship between apartment size and rental price for NYC apartments?
  • Is a mother’s height a good predictor of their child’s adult height?

The first step before fitting a linear regression model is exploratory data analysis and data visualization: is there a relationship that we can model? For example, suppose we collect heights (in cm) and weights (in kg) for 9 adults and inspect a plot of height vs. weight:

plt.scatter(data.height, data.weight) plt.xlabel('height (cm)') plt.ylabel('weight (kg)') plt.show()

scatter plot showing a positive linear relationship between height and weight (people who are taller tend to weigh more)

When we look at this plot, we see that there is some evidence of a relationship between height and weight: people who are taller tend to weigh more. In the following exercises, we’ll learn how to model this relationship with a line. If you were to draw a line through these points to describe the relationship between height and weight, what line would you draw?



A dataset has been loaded for you in script.py containing fictional data from a group of students who were surveyed about their studying and breakfast choices prior to a math test. The data is loaded as a variable named students.

Create a scatter plot with hours_studied on the x-axis and score on the y-axis.

Note that the code to show the plot (plt.show()) is already provided for you, so you do not need to add it!


If you had to draw a line on top of this plot to describe the relationship between hours studied and math score, what would that line look like?

Uncomment the code for the line plot (plt.plot(students.hours_studied, y)). Does this line look correct?

Take this course for free

Mini Info Outline Icon
By signing up for Codecademy, you agree to Codecademy's Terms of Service & Privacy Policy.

Or sign up using:

Already have an account?