Congratulations! As a recap, you’ve learned to:
- Fit a simple OLS linear regression model
- Use both quantitative and binary categorical predictors
- Interpret the coefficients of a regression model
- Check the assumptions of a regression model
A new dataset named
website has been loaded for you in the workspace containing simulated data for a sample of visitors to a website, including the amount of time in seconds they spent on the website (
time_seconds), their age (
age), and information about whether they accessed the website using Chrome or Safari (
Try to work through the following steps (solution code is provided in solution.py). Note that after showing each plot with
plt.show(), we’ve included the command
plt.clf() in the starting code to ensure that plots are not layered on top of each other.
- Create a plot of
time_seconds(vertical axis) versus
age(horizontal axis). Is there a linear relationship between these variables?
- Fit a linear model to predict
- Use the coefficients from the linear model to plot the regression line on top of your original plot.
- Calculate the fitted values and residuals
- Check the normality assumption by plotting a histogram of the residuals. Are they approximately normally distributed?
- Check the homoscedasticity assumption by plotting a the residuals against the fitted values. Is this assumption satisfied?
- Use your model to predict the amount of time that a 40 year old person will spend on the website.
- Fit another model that predicts
- Print out the coefficients. What is the difference in average time spent on each browser?