While IV estimation can be effective in certain circumstances, it also has limitations. One limitation is that the causal estimand (CACE) is not generalizable. The CACE only describes the effect of those who comply with treatment.

Another limitation is that it is difficult to find a suitable instrument that has a strong relationship with the treatment. If an instrument is only weakly related to the treatment, 2SLS regression will produce inaccurate estimates of the CACE.

To illustrate this, suppose that instead of using distance as an instrument for participation in the rebate program, we used another variable, children. The variable indicates whether or not an individual has children. Having children wouldn’t directly cause a change in recycling, but individuals with children might be less likely to participate in the rebate program. The rebate program requires individuals to take the time to drop off recycling — time that people with children might not have.

Performing the 2SLS regression again with the ivreg() function and the children variable as an instrument highlights the effect of using a weak instrument:

iv_mod_weak <- ivreg( formula = recycled ~ rebate | children, #new weak instrument data = recycle_df )

Using summary(iv_mod_weak)$coefficients we can view just the coefficient table from the results summary.

Estimate Std. Error t value Pr(>|t|) (Intercept) 126.10140 8.930242 14.120715 5.477652e-37 rebate 37.84692 18.018181 2.100485 3.631487e-02

The estimate of 37.85 is neither accurate nor precise, as highlighted by the large standard error. This estimate is similar to the estimate from OLS regression, demonstrating that this is a weak instrument.



Instead of using the email campaign as an instrument, let’s see what happens when we choose a weaker instrument.

Imagine that the online retailer decides to pay a search engine company to display an ad for the retailer’s video streaming services above the results when anyone uses search terms like “video streaming” or “streaming”. Seeing such an ad wouldn’t directly increase the amount of money a user spends on the online retailer’s website. However, the ad could make a user more likely to use the retailer’s video streaming services. Thus, the targeted ads could act as an instrumental variable.

This new variable, ads, can be found in a second dataframe called video_df2 that has been loaded for you in notebook.Rmd.

Fit a second 2SLS regression model that uses the ads variable as an instrument and save the results to iv_mod2.


The original 2SLS model with email as an instrument has been saved for you as iv_mod.

Print a summary of the results of both 2SLS regression models saved in iv_mod and iv_mod2. How do the estimates of the treatment effect differ depending on what instrument is used? What about the standard errors?

Take this course for free

Mini Info Outline Icon
By signing up for Codecademy, you agree to Codecademy's Terms of Service & Privacy Policy.

Or sign up using:

Already have an account?