So far, we’ve used R-squared, adjusted R-squared, and an F-test to compare models. These criteria are most useful for finding a model that best fits an observed set of data. They are often used when our goal is interpreting a model to understand relationships between variables.

If our goal is to choose the best model for making predictions for new/unobserved data, we may want to use a likelihood based criteria instead.

*Log-likelihood* of a linear regression model essentially measures the probability of observing our data given a particular model. Higher log-likelihood is better.

For example, we can compare two models based on log likelihood as follows:

model1 = sm.OLS.from_formula('rent ~ bedrooms + size_sqft + min_to_subway', data=rentals).fit() model2 = sm.OLS.from_formula('rent ~ bathrooms + building_age_yrs + borough', data=rentals).fit() print(model1.llf) #Output: -44282.327 print(model2.llf) #Output: -44740.623

Because model 1 has a higher log-likelihood (a smaller negative number is larger), we would choose model 1 over model 2.

### Instructions

**1.**

Using the `bikes`

dataset, fit a model to predict the number of bike rentals (`cnt`

) with the following predictors: temperature (`temp`

), windspeed (`windspeed`

), and whether or not it is a holiday (`holiday`

). Save the fitted model as `model1`

.

**2.**

Now fit a second model to predict `cnt`

using the following predictors: humidity (`hum`

), season (`season`

), and the day of the week (`weekday`

).

**3.**

Print out the log-likelihood for both models.

**4.**

Based on the log likelihood values, which model would you choose? Indicate your answer by setting a variable named `which_model`

equal to `1`

if you would choose `model1`

and equal to `2`

if you would choose `model2`

.