Learn

R-squared is one of the most common metrics to evaluate linear regression models. We can interpret R-squared as the proportion of variation in an outcome variable that is explained by a linear regression model. More explained variation is generally better.

For example, suppose we have a dataset containing information about apartment rentals for NYC apartments. We can build two different models to predict rental price and print out the R-Squared for each model as follows:

``````# Create and fit the first model to predict rent
model1 = sm.OLS.from_formula('rent ~ bedrooms + size_sqft + min_to_subway', data=rentals).fit()

# Create and fit the second model
model2 = sm.OLS.from_formula('rent ~ bathrooms + building_age_yrs + borough', data=rentals).fit()

# Print out R-squared for both models
print(model1.rsquared) #Output: 0.664
print(model2.rsquared) #Output: 0.596``````

This tells us that the first model (using bedrooms, square-footage, and minutes to the subway) explains about 66.4% of the variation in rental prices, whereas the second model only explains about 59.6% of the variation. This would lead us to choose the first model over the second.

### Instructions

1.

Using the `bikes` dataset, fit a model to predict `cnt` (the number of bike rentals) based on the temperature (`temp`), windspeed (`windspeed`), and whether or not it is a holiday (`holiday`). Save the fitted model as `model1`.

2.

Using the `bikes` dataset, fit a second model to predict `cnt` (the number of bike rentals) based on humidity (`hum`), season (`season`), and the day of the week (`weekday`). Save the fitted model as `model2`.

3.

Print out the R-squared for both models.

4.

Based on the R-squared values, which model would you choose? Indicate your answer by setting a variable named `which_model` equal to `1` if you would choose `model1` and equal to `2` if you would choose `model2`.