Now that we’ve seen what a regression model with a binary predictor looks like visually, we can actually fit the model using statsmodels.api.OLS.from_formula(), the same way we did for a quantitative predictor:

model = sm.OLS.from_formula('height ~ play_bball', data) results = model.fit() print(results.params)


Intercept 169.016 play_bball 14.628 dtype: float64

Note that this will work if the play_bball variable is coded with 0s and 1s, but it will also work if it is coded with Trues and Falses, or even if it is coded with strings like 'yes' and 'no' (in this case, the coefficient label will look something like play_bball[T.yes] in the params output, indicating that 'yes' corresponds to a 1).

To interpret this output, we first need to remember that the intercept is the expected value of the outcome variable when the predictor is equal to zero. In this case, the intercept is therefore the mean height of non-basketball players.

The slope is the expected difference in the outcome variable for a one unit difference in the predictor variable. In this case, a one unit difference in play_bball is the difference between not being a basketball player and being a basketball player. Therefore, the slope is the difference in mean heights for basketball players and non-basketball players.



The students dataset has been loaded for you in script.py. Create and fit a regression model of score predicted by breakfast using sm.OLS.from_formula() and print out the coefficients.


Code has been provided for you in script.py to calculate the mean test score for students who ate breakfast (saved as mean_score_breakfast) and the mean score for students who did not eat breakfast (saved as mean_score_no_breakfast). Calculate and print the difference in mean scores. Can you find how this number relates to the regression output?

Take this course for free

Mini Info Outline Icon
By signing up for Codecademy, you agree to Codecademy's Terms of Service & Privacy Policy.

Or sign up using:

Already have an account?