Learn

While R-squared is useful for comparing models with different sets of predictors, we saw that it could lead to overfitting when choosing between nested models.

To address this issue, we can instead use adjusted R-squared, which gives a small penalty for each additional predictor in a model. For example:

model1 = sm.OLS.from_formula('rent ~ bedrooms + size_sqft + borough', data=rentals).fit() model2 = sm.OLS.from_formula('rent ~ bedrooms + size_sqft + borough + has_doorman', data=rentals).fit() print(model1.rsquared) #Output: 0.72761 print(model2.rsquared) #Output: 0.72765 print(model1.rsquared_adj) #Output: 0.72739 print(model2.rsquared_adj) #Output: 0.72738

Note that the second model (with an additional predictor) has a slightly larger R-squared, but a slightly smaller adjusted R-squared, compared to the first model. Based on the adjusted R-squared, we would choose the smaller model.

Instructions

1.

Using the bikes dataset, fit a model to predict the number of bike rentals (cnt) with the following predictors: temp, hum, windspeed, season, holiday, and weekday. Save the fitted model as model1.

2.

Now fit a second model with cnt as the outcome variable and all the same predictors as in model1 plus the “feels like” temperature (atemp). Save the fitted model as model2.

3.

Print out the R-squared for both models.

4.

Print out the adjusted R-squared for both models.

5.

Based on the R-squared values, which model would you choose? Indicate your answer by setting a variable named which_model_rsq equal to 1 if you would choose model1 and equal to 2 if you would choose model2.

6.

Based on the adjusted R-squared values, which model would you choose? Indicate your answer by setting a variable named which_model_adj_rsq equal to 1 if you would choose model1 and equal to 2 if you would choose model2.

Take this course for free

By signing up for Codecademy, you agree to Codecademy's Terms of Service & Privacy Policy.
Already have an account?