Learn

While R-squared is useful for comparing models with different sets of predictors, we saw that it could lead to overfitting when choosing between nested models.

To address this issue, we can instead use adjusted R-squared, which gives a small penalty for each additional predictor in a model. For example:

``````model1 = sm.OLS.from_formula('rent ~ bedrooms + size_sqft + borough', data=rentals).fit()

model2 = sm.OLS.from_formula('rent ~ bedrooms + size_sqft + borough + has_doorman', data=rentals).fit()

print(model1.rsquared) #Output: 0.72761
print(model2.rsquared) #Output: 0.72765

Note that the second model (with an additional predictor) has a slightly larger R-squared, but a slightly smaller adjusted R-squared, compared to the first model. Based on the adjusted R-squared, we would choose the smaller model.

### Instructions

1.

Using the `bikes` dataset, fit a model to predict the number of bike rentals (`cnt`) with the following predictors: `temp`, `hum`, `windspeed`, `season`, `holiday`, and `weekday`. Save the fitted model as `model1`.

2.

Now fit a second model with `cnt` as the outcome variable and all the same predictors as in `model1` plus the “feels like” temperature (`atemp`). Save the fitted model as `model2`.

3.

Print out the R-squared for both models.

4.

Print out the adjusted R-squared for both models.

5.

Based on the R-squared values, which model would you choose? Indicate your answer by setting a variable named `which_model_rsq` equal to `1` if you would choose `model1` and equal to `2` if you would choose `model2`.

6.

Based on the adjusted R-squared values, which model would you choose? Indicate your answer by setting a variable named `which_model_adj_rsq` equal to `1` if you would choose `model1` and equal to `2` if you would choose `model2`.