Learn
Congratulations! As a recap, you’ve learned to:
- Fit a simple OLS linear regression model
- Use both quantitative and binary categorical predictors
- Interpret the coefficients of a regression model
- Check the assumptions of a regression model
Instructions
A new dataset named website
has been loaded for you in the workspace containing simulated data for a sample of visitors to a website, including the amount of time in seconds they spent on the website (time_seconds
), their age (age
), and information about whether they accessed the website using Chrome or Safari (browser
).
Try to work through the following steps (solution code is provided in solution.py). Note that after showing each plot with plt.show()
, we’ve included the command plt.clf()
in the starting code to ensure that plots are not layered on top of each other.
- Create a plot of
time_seconds
(vertical axis) versusage
(horizontal axis). Is there a linear relationship between these variables? - Fit a linear model to predict
time_seconds
using theage
variable - Use the coefficients from the linear model to plot the regression line on top of your original plot.
- Calculate the fitted values and residuals
- Check the normality assumption by plotting a histogram of the residuals. Are they approximately normally distributed?
- Check the homoscedasticity assumption by plotting a the residuals against the fitted values. Is this assumption satisfied?
- Use your model to predict the amount of time that a 40 year old person will spend on the website.
- Fit another model that predicts
time_seconds
based onbrowser
. - Print out the coefficients. What is the difference in average time spent on each browser?
Take this course for free
By signing up for Codecademy, you agree to Codecademy's Terms of Service & Privacy Policy.