Learn

Time to pull it all together! The interpretation of coefficents in multiple linear regression is slightly different than that of coefficents in simple linear regression. Coefficent of independent continunous variables, like podcasts, represents the difference in the predicted value of sales for each one-dollar increase in podcasts, given that all other variables in the model, including tv, are held constant. Given the output of calling summary(model) below, we can correctly say that for every one dollar increase in podcast advertisement spending, while holding the amount spent on tv and newspaper constant, the total sales of the related product increases by 1.049 dollars.

summary(model) #output Call: lm(formula = sales ~ TV + podcast + newspaper, data = train) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 4.583386 1.024616 4.473 1.65e-05 *** TV 3.006340 1.004924 7.380 1.62e-11 *** podcast 1.049249 1.027665 5.395 3.10e-07 *** newspaper 1.006340 1.002924 6.380 1.12e-11 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

In addition, the interpretation of boolean categorical variables differs slightly from that of continous variables. The coefficent value associated with a boolean categorical variable represents the effect of changing from one category to another. for instance, the coefficient value of 1.006 for newspaper tell us that running print advertisements results in a 1.006 dollar increase in sales, holding the values of TV and podcast constant.

As we’ve suggested throughout this lesson, data scientists often build many variations of a model with different combinations of independent variables before ultimately commiting to the model that best fits test data. Let’s practice building, interpreting, and selecting the best fit multi-linear model for our convert_clean dataset!

Instructions

1.

Build a multiple linear regression model which regresses impressions, clicks, and gender on total_convert, using our train dataset. Save the result to a variable called model; then call summary(model) to view the model results.

2.

How might we interpret the coefficient estimate for gender? Set the variable gender_coefficient equal to the statement that most correctly interprets the estimate value — either "a", "b", or "c":

A. The coefficient of the gender variable is not statistically significant, so we cannot come to any substantive conclusion from its’ value.

B. The coefficient of the gender variable is negative, which means that as total_convert, clicks, and impressions increases, men are less likely to purchase a advertised product.

C. The coefficient of the gender variable is negative. This means that a men are less likely than women––with the same value of clicks and impressions–– to purchase an advertised product.

3.

Let’s build a second, simpler model so that we can confirm adding gender to our model increases its’ accuracy. Build a multiple linear regression model which regresses impressions, and clicks on total_convert. Save the result to a variable called model2.

4.

Compute the R-squared value for model and model2, and save the results to rsq_model and rsq_model2 respectively. Call both variables to view their values.

5.

Which model best fits our data? Set the variable best_fit equal to the larger r-squared value.

6.

Set the variable gender_diff equal to the difference between rsq_model and rsq_model2. Uncomment the f-string at the bottom of the file to see how we would provide a narrative around the effect of gender on interaction with online advertisements.

Sign up to start coding

Mini Info Outline Icon
By signing up for Codecademy, you agree to Codecademy's Terms of Service & Privacy Policy.

Or sign up using:

Already have an account?