Learn

Congratulations! As a recap, you’ve learned to:

• Fit a simple OLS linear regression model
• Use both quantitative and binary categorical predictors
• Interpret the coefficients of a regression model
• Check the assumptions of a regression model

### Instructions

A new dataset named `website` has been loaded for you in the workspace containing simulated data for a sample of visitors to a website, including the amount of time in seconds they spent on the website (`time_seconds`), their age (`age`), and information about whether they accessed the website using Chrome or Safari (`browser`).

Try to work through the following steps (solution code is provided in solution.py). Note that after showing each plot with `plt.show()`, we’ve included the command `plt.clf()` in the starting code to ensure that plots are not layered on top of each other.

1. Create a plot of `time_seconds` (vertical axis) versus `age` (horizontal axis). Is there a linear relationship between these variables?
2. Fit a linear model to predict `time_seconds` using the `age` variable
3. Use the coefficients from the linear model to plot the regression line on top of your original plot.
4. Calculate the fitted values and residuals
5. Check the normality assumption by plotting a histogram of the residuals. Are they approximately normally distributed?
6. Check the homoscedasticity assumption by plotting a the residuals against the fitted values. Is this assumption satisfied?
7. Use your model to predict the amount of time that a 40 year old person will spend on the website.
8. Fit another model that predicts `time_seconds` based on `browser`.
9. Print out the coefficients. What is the difference in average time spent on each browser?