Learn
Linear Regression in R
Quantifying Model Fit

Once we have an understanding of the kind of relationship our model describes, we want to understand the extent to which this modeled relationship actually fits the data. This is typically referred to as the goodness-of-fit. In simple linear models, we can measure this quantitatively by assessing two things:

1. Residual standard error (RSE)
2. R squared (R^2)

The RSE is an estimate of the standard deviation of the error of the model (error in our mathematical definition of linear regression). Roughly speaking, it is the average amount that the response will deviate from the true regression line. We get the RSE at the bottom of `summary(model)`, we can also get it directly with

``````sigma(model)

#output
3.2``````

An RSE value of 3.2 means the actual sales in each market will deviate from the true regression line by approximately 3,200 units, on average. Is this too large of a deviation? Well, that’s subjective, but when compared to the average value of sales over all markets the percentage error is 22%:

``````sigma(model)/mean(train\$sales)

# output
 0.2207373``````

The RSE provides an absolute measure of lack of fit of our model to the data. But since it is measured in the units of Y, it is not always clear what constitutes a good RSE.

The R^2 statistic provides an alternative measure of fit. It represents the proportion of variance explained, so it always takes on a value between 0 and 1, and is independent of the scale of Y, our outcome variable. Similar to RSE, the R^2 can be found at the bottom of `summary(model)` but we can also extract it directly by calling `summary(model)\$r.squared`. The result below suggests that podcast advertising budget can explain 64% of the variability in the total `sales` value.

``````summary(model)\$r.squared

# output
 0.6372581``````

### Instructions

1.

The code used to produce `model`, a simple linear model summarizing the relationship between `clicks` and `total_convert` is already included in your notebook.

• Assign the result of `sigma(model)/mean(train\$total_convert)` to `avg_rse`
• Uncomment the following f-string, then run the file to see how we would contextualize the average RSE of our model
2.

Model fit is often quantified in comparison to other models, then used to determine which variation of a modeled relationship best fits the data. Let’s build a second model so that we can contextualize our fit metrics.

Assign the result of building a simple linear model of `impressions`, the total number of times a user views a version of an advertisement, on `total_convert` to the variable `model_2`.

3.

Let’s use a combination of R’s variable selection syntax, the `\$` character, and `summary()` to investigate the percent of variability explained by both `model` and `model2`.

• Call extract the r-squared measure from `model`, and save the result to a variable called `r_sq`.
• Call extract the r-squared measure from `model_2`, and save the result to a variable called `r_sq_2`.

Print out both r-square variables. Which model better explains a user’s likelihood of purchasing a product they have been shown an advertisement for?

4.

Uncomment the final f-string, then run the file to see how we would provide a narrative around the R^2 statistic and determine which model better explains user purchase behavior.