Learn

So far, we’ve used R-squared, adjusted R-squared, and an F-test to compare models. These criteria are most useful for finding a model that best fits an observed set of data. They are often used when our goal is interpreting a model to understand relationships between variables.

If our goal is to choose the best model for making predictions for new/unobserved data, we may want to use a likelihood based criteria instead.

Log-likelihood of a linear regression model essentially measures the probability of observing our data given a particular model. Higher log-likelihood is better.

For example, we can compare two models based on log likelihood as follows:

``````model1 = sm.OLS.from_formula('rent ~ bedrooms + size_sqft + min_to_subway', data=rentals).fit()

model2 = sm.OLS.from_formula('rent ~ bathrooms + building_age_yrs + borough', data=rentals).fit()

print(model1.llf) #Output: -44282.327
print(model2.llf) #Output: -44740.623``````

Because model 1 has a higher log-likelihood (a smaller negative number is larger), we would choose model 1 over model 2.

### Instructions

1.

Using the `bikes` dataset, fit a model to predict the number of bike rentals (`cnt`) with the following predictors: temperature (`temp`), windspeed (`windspeed`), and whether or not it is a holiday (`holiday`). Save the fitted model as `model1`.

2.

Now fit a second model to predict `cnt` using the following predictors: humidity (`hum`), season (`season`), and the day of the week (`weekday`).

3.

Print out the log-likelihood for both models.

4.

Based on the log likelihood values, which model would you choose? Indicate your answer by setting a variable named `which_model` equal to `1` if you would choose `model1` and equal to `2` if you would choose `model2`.