Time to pull it all together! The interpretation of coefficents in multiple linear regression is slightly different than that of coefficents in simple linear regression. Coefficent of independent continunous variables, like
podcasts, represents the difference in the predicted value of sales for each one-dollar increase in podcasts, given that all other variables in the model, including
tv, are held constant. Given the output of calling
summary(model) below, we can correctly say that for every one dollar increase in podcast advertisement spending, while holding the amount spent on
newspaper constant, the total sales of the related product increases by 1.049 dollars.
summary(model) #output Call: lm(formula = sales ~ TV + podcast + newspaper, data = train) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 4.583386 1.024616 4.473 1.65e-05 *** TV 3.006340 1.004924 7.380 1.62e-11 *** podcast 1.049249 1.027665 5.395 3.10e-07 *** newspaper 1.006340 1.002924 6.380 1.12e-11 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
In addition, the interpretation of boolean categorical variables differs slightly from that of continous variables. The coefficent value associated with a boolean categorical variable represents the effect of changing from one category to another. for instance, the coefficient value of 1.006 for
newspaper tell us that running print advertisements results in a 1.006 dollar increase in
sales, holding the values of
As we’ve suggested throughout this lesson, data scientists often build many variations of a model with different combinations of independent variables before ultimately commiting to the model that best fits test data. Let’s practice building, interpreting, and selecting the best fit multi-linear model for our
Build a multiple linear regression model which regresses
total_convert, using our
train dataset. Save the result to a variable called
model; then call
summary(model) to view the model results.
How might we interpret the coefficient estimate for
gender? Set the variable
gender_coefficient equal to the statement that most correctly interprets the estimate value — either
A. The coefficient of the
gender variable is not statistically significant, so we cannot come to any substantive conclusion from its’ value.
B. The coefficient of the
gender variable is negative, which means that as
impressions increases, men are less likely to purchase a advertised product.
C. The coefficient of the gender variable is negative. This means that a men are less likely than women––with the same value of
impressions–– to purchase an advertised product.
Let’s build a second, simpler model so that we can confirm adding gender to our model increases its’ accuracy. Build a multiple linear regression model which regresses
total_convert. Save the result to a variable called
Compute the R-squared value for
model2, and save the results to
rsq_model2 respectively. Call both variables to view their values.
Which model best fits our data? Set the variable
best_fit equal to the larger r-squared value.
Set the variable
gender_diff equal to the difference between
rsq_model2. Uncomment the f-string at the bottom of the file to see how we would provide a narrative around the effect of gender on interaction with online advertisements.