There are a number of Python libraries that can be used to fit a linear regression, but in this course, we will use the OLS.from_formula() function from statsmodels.api because it uses simple syntax and provides comprehensive model summaries.

Suppose we have a dataset named body_measurements with columns height and weight. If we want to fit a model that can predict weight based on height, we can create the model as follows:

model = sm.OLS.from_formula('weight ~ height', data = body_measurements)

We used the formula 'weight ~ height' because we want to predict weight (it is the outcome variable) using height as a predictor. Then, we can fit the model using .fit():

results = model.fit()

Finally, we can inspect a summary of the results using print(results.summary()). For now, we’ll only look at the coefficients using results.params, but the full summary table is useful because it contains other important diagnostic information.



Intercept -21.67 height 0.50 dtype: float64

This tells us that the best-fit intercept is -21.67, and the best-fit slope is 0.50.



Using the students dataset that has been loaded in script.py, create a linear regression model that predicts student score using hours_studied as a predictor and save the result as a variable named model.


Fit the model using the .fit() method on model (created in the previous step), and save the fitted model as results.


Print out the model coefficients using either .params.

Take this course for free

Mini Info Outline Icon
By signing up for Codecademy, you agree to Codecademy's Terms of Service & Privacy Policy.

Or sign up using:

Already have an account?