Performing 2SLS in R is easy if we use the `ivreg()`

function from the `AER`

package.

The key difference in syntax between `ivreg()`

and other regression functions is that the `formula`

argument of the `ivreg()`

function must include the instrument. If we wanted to perform 2SLS regression with variables `outcome`

as the outcome, `treatment`

as the treatment, and `instrument`

as the instrument, the model formula would be `outcome ~ treatment | instrument`

.

To fit the 2SLS regression using the recycling data, we would use the following code:

# import library library(AER) # run 2SLS regression iv_mod <- ivreg( #outcome ~ treatment | instrument formula = recycled ~ rebate | distance, data = recycle_df )

To view the coefficients and standard errors, we can use `summary(iv_mod)$coefficients`

, which gives the following output (you may need to make this section of the screen wider to view the full table):

Estimate Std. Error t value Pr(>|t|) (Intercept) 129.36463 0.8683141 148.98368 0.000000e+00 rebate 31.25452 1.4629239 21.36442 5.118885e-68

The results of 2SLS regression show that the estimate of the effect of the rebate program is 31.25, meaning participation in the rebate program led to an average increase in recycling of 31.25 kilograms/person. This only applies to compliers: those individuals who participated in the rebate program because they lived within 5 miles of a recycling center, but who would not have participated otherwise.

You may be wondering why we couldn’t just fit the two separate regression models described in the previous exercise using `lm()`

or `glm()`

functions. The `ivreg()`

function is preferred because it automatically corrects standard errors to account for the fact that the second stage regression model uses predicted values of the treatment.

If we use incorrect standard errors, we could make incorrect conclusions about the treatment effect:

- Lower standard errors correspond with more precise treatment effect estimates and a greater likelihood that the treatment coefficient will be found to be significantly different from zero.
- Higher standard errors correspond with less precise treatment effect estimates and a lesser likelihood that the treatment coefficient will be found to be significantly different from zero.

### Instructions

**1.**

Fit a linear ordinary least squares (OLS) regression model to estimate the effect of use of video streaming services on the amount spent by users of the online retailer. Save this regression model as `ols_model`

.

**2.**

Uncomment the code to print a summary of the coefficients from the ordinary least squares (OLS) model.

**3.**

Use the `ivreg()`

function from the AER package to fit the 2SLS regression model in one step. Make sure to modify the model formula to account for the instrument. Save this model as `iv_mod`

.

**4.**

Uncomment the `summary()`

function to print the resulting coefficients. How does the estimate differ from the OLS estimate in the last checkpoint?