Learn

In addition to the quantitative measures that characterize our model accuracy, it is always a best practice to produce visual summaries to assess our model. First, we should always visualize our model within our data. For simple linear regression this is quite simple; we can use geom_point() to plot our observed values, and geom_smooth(method = "lm") to plot our model. In addition, we can include a second call to geom_smooth(), with parameters (se = FALSE, color = "red"). This combination of function calls allows us to compare the linearity of our model, visualized below as the blue line with the 95% confidence interval covering the shaded region, in comparison to a non-linear LOESS smoother visualized in red.

ggplot(train, aes(podcast, sales)) + geom_point() + geom_smooth(method = "lm") + geom_smooth(se = FALSE, color = "red")

A linear regression model with a LOESS smoother

LOESS smoothers plot a line based on the weighted value of data points; the line produced by a LOESS smoother is similar to taking a moving average of data points as our x-axis variable increases. The smoother should not be used to predict new values, as it relies heavily on our training data, but it is a helpful tool for visualizing where our linear model diverges from our training data.

Considering the LOESS smoother remains within the confidence interval of our model, we can assume the linear trend fits the essence of this relationship. However, we should note that as the podcast advertising budget gets closer to 0 there is a stronger reduction in sales beyond what the linear trend follows; this means that our model might be less accurate in instances where the podcast budget is very low.

Instructions

1.

We’ve plotted clicks against total converts. Let’s add a LOESS smoother. Add two calls of geom_smooth() to plot. The first should use the parameter method = "lm". The second should use the parameters se = FALSE and color = "red".

2.

How closely does the relationship between clicks and conversion follow a linear trend? Set the variable linear_relationship equal to either "a", "b", "c", or "d" depending on the statement that best characterizes the relations:

A. The relationship is less linear when clicks approaches large values.

B. There is a clear divergence from a linear relationship when clicks approaches zero or when clicks approaches infinity.

C. The relationship between clicks and total_conversion is perfectly linear.

D. There is no linear relationship between clicks and total_conversion

3.

Let’s extend our linearity analysis to our model2, which describes the relationship between impressions and total_conversion. Add the two calls to geom_smooth() to plot_2 to make a comparison to a LOESS model.

4.

How closely does the relationship between impressions and conversion follow a linear trend? Set the variable linear_relationship_2 equal to "a", "b", "c", or "d" depending on the statement that best characterizes the relations:

A. The relationship between impressions and total_conversion is perfectly linear.

B. There is a clear divergence from a linear relationship when impressions approaches zero and when impressions is around 500,000.

C. The relationship is less linear when impressions approaches very large values and when impressions is around 500,000.

D. There is no linear relationship between impressions and total_conversion

Sign up to start coding

Mini Info Outline Icon
By signing up for Codecademy, you agree to Codecademy's Terms of Service & Privacy Policy.

Or sign up using:

Already have an account?