In this lesson, we’ll discuss some of the ways we can compare and choose linear regression models using a variety of different methods.
For example, suppose that we work at a bike rental company and have a dataset where each row represents a unique day of business (including info about the number of bikes that were rented, the weather, the day of the week, etc.). We might want to use this dataset to predict how many bikes will be rented tomorrow (so we can plan). Alternatively, we might want to use the dataset to understand which factors are most predictive of bike usage.
For either goal, we can use a linear regression model. The problem is: there are many different models that we could create. How do we know which one to use?
This lesson will focus on some common ways to compare and choose a linear model, both for prediction and data analysis.
Run the code in script.py to create and fit two different linear regression models. IMPORTANT NOTE: In order to view the output (in this exercise and the rest of the lesson), you’ll need to expand the width of the output terminal by clicking and dragging the left edge further left.
For both models, the outcome variable is the number of bike rentals on a particular day. Inspect the output for each model. Do you have any ideas about how you might compare these models and choose one? Some things you might think about:
- What are the benefits of a complex model (with many predictors, interaction terms, polynomial terms, etc.)?
- What are the benefits of a simple model (with just a few predictors)?
- Are some predictors more important to include than others?