There are a number of Python libraries that can be used to fit a linear regression, but in this course, we will use the OLS.from_formula()
function from statsmodels.api
because it uses simple syntax and provides comprehensive model summaries.
Suppose we have a dataset named body_measurements
with columns height
and weight
. If we want to fit a model that can predict weight based on height, we can create the model as follows:
model = sm.OLS.from_formula('weight ~ height', data = body_measurements)
We used the formula 'weight ~ height'
because we want to predict weight
(it is the outcome variable) using height
as a predictor. Then, we can fit the model using .fit()
:
results = model.fit()
Finally, we can inspect a summary of the results using print(results.summary())
. For now, we’ll only look at the coefficients using results.params
, but the full summary table is useful because it contains other important diagnostic information.
print(results.params)
Output:
Intercept -21.67 height 0.50 dtype: float64
This tells us that the best-fit intercept is -21.67
, and the best-fit slope is 0.50
.
Instructions
Using the students
dataset that has been loaded in script.py, create a linear regression model that predicts student score
using hours_studied
as a predictor and save the result as a variable named model
.
Fit the model using the .fit()
method on model
(created in the previous step), and save the fitted model as results
.
Print out the model coefficients using either .params
.