Learn

Similarly to R-squared, log-likelihood only increases as we add more predictors to a model. In the same way that adjusted R-squared penalizes R-squared for more predictors, there are criteria that penalize the log-likelihood for more predictors.

The two most commonly used are Akaike information criterion (AIC) and Bayesian information criterion (BIC). Both AIC and BIC use negative log-likelihood, so we actually want the model with the LOWEST AIC or BIC.

AIC and BIC are similar, but BIC gives a bigger penalty for each additional predictor, so it is used for finding the best “simple” model. This is useful because it makes the model more interpretable. For example:

model1 = sm.OLS.from_formula('rent ~ bedrooms + size_sqft + borough', data=rentals).fit() model2 = sm.OLS.from_formula('rent ~ bedrooms + size_sqft + borough + has_doorman', data=rentals).fit() print(model1.llf) #Output: -43756.418 print(model2.llf) #Output: -43756.017 print(model1.aic) #Output: 87522.837 print(model2.aic) #Output: 87524.034 print(model1.bic) #Output: 87555.423 print(model2.bic) #Output: 87563.137

We see that the log-likelihood for model 2 is slightly larger (better), but the AIC for model 2 is slightly larger (worse), and BIC even more so. Both AIC and BIC would lead us to choose model 1, whereas log-likelihood would lead us to choose model 2.

Instructions

1.

Two different models have been fit for you in script.py and saved as model1 and model2, respectively. Print out the log-likelihood for both models.

2.

Based on the log-likelihood values, which model would you choose? Indicate your answer by setting a variable named which_model_loglik equal to 1 if you would choose model1 and equal to 2 if you would choose model2.

3.

Print out the AIC for model1 and model2.

4.

Based on the AIC values, which model would you choose? Indicate your answer by setting a variable named which_model_aic equal to 1 if you would choose model1 and equal to 2 if you would choose model2.

5.

Print out the BIC for model1 and model2.

6.

Based on the BIC values, which model would you choose? Indicate your answer by setting a variable named which_model_bic equal to 1 if you would choose model1 and equal to 2 if you would choose model2.

Take this course for free

By signing up for Codecademy, you agree to Codecademy's Terms of Service & Privacy Policy.
Already have an account?