Learn

Congratulations! As a recap, you’ve learned to:

- Fit a simple OLS linear regression model
- Use both quantitative and binary categorical predictors
- Interpret the coefficients of a regression model
- Check the assumptions of a regression model

### Instructions

A new dataset named `website`

has been loaded for you in the workspace containing simulated data for a sample of visitors to a website, including the amount of time in seconds they spent on the website (`time_seconds`

), their age (`age`

), and information about whether they accessed the website using Chrome or Safari (`browser`

).

Try to work through the following steps (solution code is provided in **solution.py**). Note that after showing each plot with `plt.show()`

, we’ve included the command `plt.clf()`

in the starting code to ensure that plots are not layered on top of each other.

- Create a plot of
`time_seconds`

(vertical axis) versus`age`

(horizontal axis). Is there a linear relationship between these variables? - Fit a linear model to predict
`time_seconds`

using the`age`

variable - Use the coefficients from the linear model to plot the regression line on top of your original plot.
- Calculate the fitted values and residuals
- Check the normality assumption by plotting a histogram of the residuals. Are they approximately normally distributed?
- Check the homoscedasticity assumption by plotting a the residuals against the fitted values. Is this assumption satisfied?
- Use your model to predict the amount of time that a 40 year old person will spend on the website.
- Fit another model that predicts
`time_seconds`

based on`browser`

. - Print out the coefficients. What is the difference in average time spent on each browser?

# Take this course for free

By signing up for Codecademy, you agree to Codecademy's Terms of Service & Privacy Policy.