Learn

Performing 2SLS in R is easy if we use the `ivreg()` function from the `AER` package.

The key difference in syntax between `ivreg()` and other regression functions is that the `formula` argument of the `ivreg()` function must include the instrument. If we wanted to perform 2SLS regression with variables `outcome` as the outcome, `treatment` as the treatment, and `instrument` as the instrument, the model formula would be `outcome ~ treatment | instrument`.

To fit the 2SLS regression using the recycling data, we would use the following code:

``````# import library
library(AER)

# run 2SLS regression
iv_mod <- ivreg(
#outcome ~ treatment | instrument
formula = recycled ~ rebate | distance,
data = recycle_df
)``````

To view the coefficients and standard errors, we can use `summary(iv_mod)\$coefficients`, which gives the following output (you may need to make this section of the screen wider to view the full table):

``````             Estimate Std. Error   t value     Pr(>|t|)
(Intercept) 129.36463  0.8683141 148.98368 0.000000e+00
rebate       31.25452  1.4629239  21.36442 5.118885e-68``````

The results of 2SLS regression show that the estimate of the effect of the rebate program is 31.25, meaning participation in the rebate program led to an average increase in recycling of 31.25 kilograms/person. This only applies to compliers: those individuals who participated in the rebate program because they lived within 5 miles of a recycling center, but who would not have participated otherwise.

You may be wondering why we couldn’t just fit the two separate regression models described in the previous exercise using `lm()` or `glm()` functions. The `ivreg()` function is preferred because it automatically corrects standard errors to account for the fact that the second stage regression model uses predicted values of the treatment.

Lower standard errors correspond with more precise treatment effect estimates and a greater likelihood that the treatment coefficient will be found significantly different from zero. If we use incorrect standard errors, we could make incorrect conclusions about the treatment effect.

### Instructions

1.

Fit a linear ordinary least squares (OLS) regression model to estimate the effect of use of video streaming services on the amount spent by users of the online retailer. Save this regression model as `ols_model`.

2.

Uncomment the code to print a summary of the coefficients from the ordinary least squares (OLS) model.

3.

Use the `ivreg()` function from the AER package to fit the 2SLS regression model in one step. Make sure to modify the model formula to account for the instrument. Save this model as `iv_mod`.

4.

Uncomment the `summary()` function to print the resulting coefficients. How does the estimate differ from the OLS estimate in the last checkpoint?