While IV estimation can be effective in certain circumstances, it also has limitations. One limitation is that the causal estimand (CACE) is not generalizable. The CACE only describes the effect of those who comply with treatment.
Another limitation is that it is difficult to find a suitable instrument that has a strong relationship with the treatment. If an instrument is only weakly related to the treatment, 2SLS regression will produce inaccurate estimates of the CACE.
To illustrate this, suppose that instead of using distance as an instrument for participation in the rebate program, we used another variable,
children. The variable indicates whether or not an individual has children. Having children wouldn’t directly cause a change in recycling, but individuals with children might be less likely to participate in the rebate program. The rebate program requires individuals to take the time to drop off recycling — time that people with children might not have.
Performing the 2SLS regression again with the
ivreg() function and the
children variable as an instrument highlights the effect of using a weak instrument:
iv_mod_weak <- ivreg( formula = recycled ~ rebate | children, #new weak instrument data = recycle_df )
summary(iv_mod_weak)$coefficients we can view just the coefficient table from the results summary.
Estimate Std. Error t value Pr(>|t|) (Intercept) 126.10140 8.930242 14.120715 5.477652e-37 rebate 37.84692 18.018181 2.100485 3.631487e-02
The estimate of 37.85 is neither accurate nor precise, as highlighted by the large standard error. This estimate is similar to the estimate from OLS regression, demonstrating that this is a weak instrument.
Instead of using the email campaign as an instrument, let’s see what happens when we choose a weaker instrument.
Imagine that the online retailer decides to pay a search engine company to display an ad for the retailer’s video streaming services above the results when anyone uses search terms like “video streaming” or “streaming”. Seeing such an ad wouldn’t directly increase the amount of money a user spends on the online retailer’s website. However, the ad could make a user more likely to use the retailer’s video streaming services. Thus, the targeted ads could act as an instrumental variable.
This new variable,
ads, can be found in a second dataframe called
video_df2 that has been loaded for you in notebook.Rmd.
Fit a second 2SLS regression model that uses the
ads variable as an instrument and save the results to
The original 2SLS model with
Print a summary of the results of both 2SLS regression models saved in
iv_mod2. How do the estimates of the treatment effect differ depending on what instrument is used? What about the standard errors?