Performing 2SLS in R is easy if we use the ivreg() function from the AER package.

The key difference in syntax between ivreg() and other regression functions is that the formula argument of the ivreg() function must include the instrument. If we wanted to perform 2SLS regression with variables outcome as the outcome, treatment as the treatment, and instrument as the instrument, the model formula would be outcome ~ treatment | instrument.

To fit the 2SLS regression using the recycling data, we would use the following code:

# import library library(AER) # run 2SLS regression iv_mod <- ivreg( #outcome ~ treatment | instrument formula = recycled ~ rebate | distance, data = recycle_df )

To view the coefficients and standard errors, we can use summary(iv_mod)$coefficients, which gives the following output (you may need to make this section of the screen wider to view the full table):

Estimate Std. Error t value Pr(>|t|) (Intercept) 129.36463 0.8683141 148.98368 0.000000e+00 rebate 31.25452 1.4629239 21.36442 5.118885e-68

The results of 2SLS regression show that the estimate of the effect of the rebate program is 31.25, meaning participation in the rebate program led to an average increase in recycling of 31.25 kilograms/person. This only applies to compliers: those individuals who participated in the rebate program because they lived within 5 miles of a recycling center, but who would not have participated otherwise.

You may be wondering why we couldn’t just fit the two separate regression models described in the previous exercise using lm() or glm() functions. The ivreg() function is preferred because it automatically corrects standard errors to account for the fact that the second stage regression model uses predicted values of the treatment.

If we use incorrect standard errors, we could make incorrect conclusions about the treatment effect:

  • Lower standard errors correspond with more precise treatment effect estimates and a greater likelihood that the treatment coefficient will be found to be significantly different from zero.
  • Higher standard errors correspond with less precise treatment effect estimates and a lesser likelihood that the treatment coefficient will be found to be significantly different from zero.



Fit a linear ordinary least squares (OLS) regression model to estimate the effect of use of video streaming services on the amount spent by users of the online retailer. Save this regression model as ols_model.


Uncomment the code to print a summary of the coefficients from the ordinary least squares (OLS) model.


Use the ivreg() function from the AER package to fit the 2SLS regression model in one step. Make sure to modify the model formula to account for the instrument. Save this model as iv_mod.


Uncomment the summary() function to print the resulting coefficients. How does the estimate differ from the OLS estimate in the last checkpoint?

Take this course for free

Mini Info Outline Icon
By signing up for Codecademy, you agree to Codecademy's Terms of Service & Privacy Policy.

Or sign up using:

Already have an account?