Time to pull it all together! The interpretation of coefficents in multiple linear regression is slightly different than that of coefficents in simple linear regression. Coefficent of independent continunous variables, like podcasts
, represents the difference in the predicted value of sales for each one-dollar increase in podcasts, given that all other variables in the model, including tv
, are held constant. Given the output of calling summary(model)
below, we can correctly say that for every one dollar increase in podcast advertisement spending, while holding the amount spent on tv
and newspaper
constant, the total sales of the related product increases by 1.049 dollars.
summary(model) #output Call: lm(formula = sales ~ TV + podcast + newspaper, data = train) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 4.583386 1.024616 4.473 1.65e-05 *** TV 3.006340 1.004924 7.380 1.62e-11 *** podcast 1.049249 1.027665 5.395 3.10e-07 *** newspaper 1.006340 1.002924 6.380 1.12e-11 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
In addition, the interpretation of boolean categorical variables differs slightly from that of continous variables. The coefficent value associated with a boolean categorical variable represents the effect of changing from one category to another. for instance, the coefficient value of 1.006 for newspaper
tell us that running print advertisements results in a 1.006 dollar increase in sales
, holding the values of TV
and podcast
constant.
As we’ve suggested throughout this lesson, data scientists often build many variations of a model with different combinations of independent variables before ultimately commiting to the model that best fits test data. Let’s practice building, interpreting, and selecting the best fit multi-linear model for our convert_clean
dataset!
Instructions
Build a multiple linear regression model which regresses impressions
, clicks
, and gender
on total_convert
, using our train
dataset. Save the result to a variable called model
; then call summary(model)
to view the model results.
How might we interpret the coefficient estimate for gender
? Set the variable gender_coefficient
equal to the statement that most correctly interprets the estimate value — either "a"
, "b"
, or "c"
:
A. The coefficient of the gender
variable is not statistically significant, so we cannot come to any substantive conclusion from its’ value.
B. The coefficient of the gender
variable is negative, which means that as total_convert
, clicks
, and impressions
increases, men are less likely to purchase a advertised product.
C. The coefficient of the gender variable is negative. This means that a men are less likely than women––with the same value of clicks
and impressions
–– to purchase an advertised product.
Let’s build a second, simpler model so that we can confirm adding gender to our model increases its’ accuracy. Build a multiple linear regression model which regresses impressions
, and clicks
on total_convert
. Save the result to a variable called model2
.
Compute the R-squared value for model
and model2
, and save the results to rsq_model
and rsq_model2
respectively. Call both variables to view their values.
Which model best fits our data? Set the variable best_fit
equal to the larger r-squared value.
Set the variable gender_diff
equal to the difference between rsq_model
and rsq_model2
. Uncomment the f-string at the bottom of the file to see how we would provide a narrative around the effect of gender on interaction with online advertisements.