In RDD, we know we need to look at points near the cutoff to find treatment and control groups that are similar. But how do we know how close to look?
The bandwidth describes the distance on either side of the cutoff we should use to reduce our dataset. Any points that are more than one bandwidth above or below the cutoff are discarded. Choosing the bandwidth can have a serious impact on the results of an RDD analysis:
- A wider bandwidth keeps more of the original dataset, so we have more information to estimate the treatment effect with. However, the treatment groups might be too different on confounding variables, which could decrease accuracy.
- A narrower bandwidth retains less of the original dataset, so treatment groups will be more alike. However, the smaller sample size means less information to estimate the treatment effect.
We could select the bandwidth based on what we BELIEVE is best. However, an algorithm that optimizes the bandwidth mathematically may be a better choice. A popular choice—which we will use—is the Imbens-Kalyanaraman (IK) algorithm.
The R package
rdd contains all of the tools needed to calculate the optimal bandwidth and carry out an RDD analysis. To calculate the IK bandwidth using
rdd, we will use the
IKbandwidth() function, which requires three arguments:
X: the forcing variable
Y: the outcome variable
cutpoint: the cutoff value to use.
To calculate the IK bandwidth for the contribution matching dataset, we would use the following code:
library(rdd) # calculate IK bandwidth cont_ik_bw <- IKbandwidth( X = cont_data$size, # forcing variable Y = cont_data$contribution, # outcome variable cutpoint = cont_cutpoint # cutpoint ) # print the IK bandwidth to the console cont_ik_bw  13.26322
The reduced dataset used in our RDD analysis will include only the companies that have between 286 and 314 employees (300 ± 13.26). Companies with between 286 and 314 employees are likely to be similar on other variables that may impact employee contributions, such as average salary or insurance costs.
To illustrate the bandwidth visually, we can add bandwidth lines to the scatterplot. We can use
geom_vline() to add reference lines at the cutpoint ± the bandwidth to our scatter plot
rdd_scatter from earlier:
rdd_scatter + geom_vline(xintercept = 300 + c(-cont_ik_bw, cont_ik_bw)) # add lines to indicate the bandwidth
This plot shows us just how narrow the optimal bandwidth is for the contribution program dataset.
air_data has been loaded for you in notebook.Rmd. Calculate the Imbens-Kalyanaraman (IK) optimal bandwidth using the
IKbandwidth() function. Save the results to
air_ik_bw. Think about whether the bandwidth seems large or small in relation to the scale of the forcing variable.
A scatter plot of AQI (
aqi) against power plant output (
watts) with a dashed line at 600 megawatts has been created for you and saved as
air_scatter to add solid vertical lines for the bandwidth cutoff lines. Save the result to
air_scatter2. Do you think the two device groups within this bandwidth will be similar enough to compare? What might be the tradeoff if the bandwidth is too narrow?