Congratulations! In this lesson, you’ve learned a number of different methods for model comparison:
- For choosing a model that best represents the data we have:
- Adjusted R-squared
- For choosing a model for accurate out-of-sample prediction:
- Log likelihood
- Training/test sets
Note that we’ve covered many different methods for choosing a model and they don’t always agree. In order to choose a method, it’s important to consider your ultimate goal (analysis vs. prediction) and what you want to prioritize (simplicity and interpretability vs. accuracy)
In this final workspace, we’ve loaded the StreetEasy dataset for you to investigate further. The dataset contains the following columns:
rent: the monthly rental price in dollars
bedrooms: the number of bedrooms
bathrooms: the number of bathrooms
size_sqft: the area in square feet
min_to_subway: minutes walking distance to the nearest subway station
building_age_yrs: age of the building in years
no_fee: whether or not there is a broker fee
has_roofdeck: whether or not there is a roofdeck
has_washer_dryer: whether or not there is a washer and dryer
has_doorman: whether or not there is a doorman
elevator: whether or not there is an elevator
has_dishwasher: whether or not there is a dishwasher
has_patio: whether or not there is a patio
has_gym: whether or not there is a gym
neighborhood: neighborhood where the apartment is located
borough: borough where the apartment is located
Which predictors do you think will be most important in predicting the rental price of an apartment in NYC? Using the predictors you think are most relevant:
- Fit a few different models
- Compare the models based on adjusted R-squared. Which would you choose?
- Compare the models using an F-test. Which would you choose?
- Compare the models using AIC/BIC. Which would you choose?
- Overall, think about which model you would choose based on your analysis. Did these comparison methods agree or disagree in terms of what was considered “best”?