Learn

Congratulations! As a recap, you’ve learned to:

  • Fit a simple OLS linear regression model
  • Use both quantitative and binary categorical predictors
  • Interpret the coefficients of a regression model
  • Check the assumptions of a regression model

Instructions

A new dataset named website has been loaded for you in the workspace containing simulated data for a sample of visitors to a website, including the amount of time in seconds they spent on the website (time_seconds), their age (age), and information about whether they accessed the website using Chrome or Safari (browser).

Try to work through the following steps (solution code is provided in solution.py). Note that after showing each plot with plt.show(), we’ve included the command plt.clf() in the starting code to ensure that plots are not layered on top of each other.

  1. Create a plot of time_seconds (vertical axis) versus age (horizontal axis). Is there a linear relationship between these variables?
  2. Fit a linear model to predict time_seconds using the age variable
  3. Use the coefficients from the linear model to plot the regression line on top of your original plot.
  4. Calculate the fitted values and residuals
  5. Check the normality assumption by plotting a histogram of the residuals. Are they approximately normally distributed?
  6. Check the homoscedasticity assumption by plotting a the residuals against the fitted values. Is this assumption satisfied?
  7. Use your model to predict the amount of time that a 40 year old person will spend on the website.
  8. Fit another model that predicts time_seconds based on web_browser.
  9. Print out the coefficients. What is the difference in average time spent on each browser?

Take this course for free

By signing up for Codecademy, you agree to Codecademy's Terms of Service & Privacy Policy.
Already have an account?