In addition to the quantitative measures that characterize our model accuracy, it is always a best practice to produce visual summaries to assess our model. First, we should always visualize our model within our data. For simple linear regression this is quite simple; we can use geom_point()
to plot our observed values, and geom_smooth(method = "lm")
to plot our model. In addition, we can include a second call to geom_smooth()
, with parameters (se = FALSE, color = "red")
. This combination of function calls allows us to compare the linearity of our model, visualized below as the blue line with the 95% confidence interval covering the shaded region, in comparison to a non-linear LOESS smoother visualized in red.
ggplot(train, aes(podcast, sales)) + geom_point() + geom_smooth(method = "lm") + geom_smooth(se = FALSE, color = "red")
LOESS smoothers plot a line based on the weighted value of data points; the line produced by a LOESS smoother is similar to taking a moving average of data points as our x-axis variable increases. The smoother should not be used to predict new values, as it relies heavily on our training data, but it is a helpful tool for visualizing where our linear model diverges from our training data.
Considering the LOESS smoother remains within the confidence interval of our model, we can assume the linear trend fits the essence of this relationship. However, we should note that as the podcast advertising budget gets closer to 0 there is a stronger reduction in sales beyond what the linear trend follows; this means that our model might be less accurate in instances where the podcast budget is very low.
Instructions
We’ve plotted clicks against total converts. Let’s add a LOESS smoother. Add two calls of geom_smooth()
to plot
. The first should use the parameter method = "lm"
. The second should use the parameters se = FALSE
and color = "red"
.
How closely does the relationship between clicks and conversion follow a linear trend? Set the variable linear_relationship
equal to either "a"
, "b"
, "c"
, or "d"
depending on the statement that best characterizes the relations:
A. The relationship is less linear when clicks
approaches large values.
B. There is a clear divergence from a linear relationship when clicks
approaches zero or when clicks
approaches infinity.
C. The relationship between clicks
and total_conversion
is perfectly linear.
D. There is no linear relationship between clicks
and total_conversion
Let’s extend our linearity analysis to our model2
, which describes the relationship between impressions
and total_conversion
. Add the two calls to geom_smooth()
to plot_2
to make a comparison to a LOESS model.
How closely does the relationship between impressions and conversion follow a linear trend? Set the variable linear_relationship_2
equal to "a"
, "b"
, "c"
, or "d"
depending on the statement that best characterizes the relations:
A. The relationship between impressions
and total_conversion
is perfectly linear.
B. There is a clear divergence from a linear relationship when impressions
approaches zero and when impressions
is around 500,000.
C. The relationship is less linear when impressions
approaches very large values and when impressions
is around 500,000.
D. There is no linear relationship between impressions
and total_conversion