Performing 2SLS in R is easy if we use the
ivreg() function from the
The key difference in syntax between
ivreg() and other regression functions is that the
formula argument of the
ivreg() function must include the instrument. If we wanted to perform 2SLS regression with variables
outcome as the outcome,
treatment as the treatment, and
instrument as the instrument, the model formula would be
outcome ~ treatment | instrument.
To fit the 2SLS regression using the recycling data, we would use the following code:
# import library library(AER) # run 2SLS regression iv_mod <- ivreg( #outcome ~ treatment | instrument formula = recycled ~ rebate | distance, data = recycle_df )
To view the coefficients and standard errors, we can use
summary(iv_mod)$coefficients, which gives the following output (you may need to make this section of the screen wider to view the full table):
Estimate Std. Error t value Pr(>|t|) (Intercept) 129.36463 0.8683141 148.98368 0.000000e+00 rebate 31.25452 1.4629239 21.36442 5.118885e-68
The results of 2SLS regression show that the estimate of the effect of the rebate program is 31.25, meaning participation in the rebate program led to an average increase in recycling of 31.25 kilograms/person. This only applies to compliers: those individuals who participated in the rebate program because they lived within 5 miles of a recycling center, but who would not have participated otherwise.
You may be wondering why we couldn’t just fit the two separate regression models described in the previous exercise using
glm() functions. The
ivreg() function is preferred because it automatically corrects standard errors to account for the fact that the second stage regression model uses predicted values of the treatment.
If we use incorrect standard errors, we could make incorrect conclusions about the treatment effect:
- Lower standard errors correspond with more precise treatment effect estimates and a greater likelihood that the treatment coefficient will be found to be significantly different from zero.
- Higher standard errors correspond with less precise treatment effect estimates and a lesser likelihood that the treatment coefficient will be found to be significantly different from zero.
Fit a linear ordinary least squares (OLS) regression model to estimate the effect of use of video streaming services on the amount spent by users of the online retailer. Save this regression model as
Uncomment the code to print a summary of the coefficients from the ordinary least squares (OLS) model.
ivreg() function from the AER package to fit the 2SLS regression model in one step. Make sure to modify the model formula to account for the instrument. Save this model as
summary() function to print the resulting coefficients. How does the estimate differ from the OLS estimate in the last checkpoint?