As we’ve seen, the advantages of regression discontinuity design are that RDD:

- Is a simple method to understand and implement.
- Avoids using a complicated regression model for the entire dataset — the local regression model is simple.
- Is useful in cases where there is no overlap on a confounding variable, which may prevent us from using stratification or propensity score analysis.

However, there are several drawbacks to RDD inherent to the method:

- Smaller bandwidths make RDD assumptions more plausible BUT also reduce the sample size.
- The local average treatment effect (LATE) is not an easily interpretable estimand. We calculated the effect of the contribution matching program only among the companies with close to 300 employees. How confident can we be that this effect would be the same in much smaller or larger companies?

Let’s explore the tradeoffs with an example. Say we run `RDestimate()`

on the contribution data again, but set the bandwidth to 100 instead of the IK optimal bandwidth of 13. We save the results to `rdd_100`

and print the output:

Call: RDestimate(formula = contribution ~ size, data = cont_data, cutpoint = 300, bw = 100) Coefficients: LATE Half-BW Double-BW 56.71 59.90 53.54

We can also get information on the number of observations included and the standard error of the LATE for each bandwidth with the code that follows.

rdd_100$obs #number of observations # Output [1] 113 68 178 rdd_100$se #standard errors # Output [1] 6.647 9.077 5.269

Let’s consider just the half-bandwidth (50) and the double-bandwidth (200). Using the half-bandwidth, we analyze only companies with 250-350 employees, so we may believe these companies are very similar to one another. But this leaves only 68 companies in our sample and a standard error of 9.077 for the LATE. We may ask:

- Can we trust a LATE with a higher standard error?
- Do these findings apply to companies outside the range of 250-350 employees?

At the double-bandwidth, we analyze companies with 100-500 employees, so our sample size is much larger at 178 companies and our standard error is reduced to 5.269. But how confident are we that companies with 100 employees are similar enough to compare to companies with 500 employees?

Regression discontinuity design is a useful method to keep in our causal inference toolbox, but we must be aware of the tradeoffs throughout the process.

### Instructions

**1.**

A new RDD model with a bandwidth of 50 called `air_rdd_50`

has been created for you in **notebook.Rmd**. Print `air_rdd_50`

and inspect the results. Are there big differences in the estimated LATE across bandwidths of 50, 25, and 100?

**2.**

Check the sample size under each reported bandwidth of `air_rdd_50`

. Is there a big difference in sample size when you look at power plants with outputs of 575-625 megawatts (`Half-BW`

) and those with outputs of 500-600 megawatts (`Double-BW`

)?

**3.**

Check the standard errors of the LATE under each reported bandwidth of `air_rdd_50`

. Is there a big difference in standard errors when you look at power plants with outputs of 575-625 megawatts (`Half-BW`

) and those with outputs of 500-600 megawatts (`Double-BW`

)? Does the standard error get larger or smaller with bigger samples?