The use of a bandwidth impacts the type of causal estimand we can calculate in a regression discontinuity design analysis. Because the RDD approach uses a subset of the full dataset, we can only estimate the local average treatment effect (LATE). The LATE is the average treatment effect among the subset of data that falls within the range of the bandwidth.
To estimate the LATE in RDD, a regression model that allows for different slopes on each side of the cutpoint is fit. The regression model is then used to get a predicted value of the outcome variable for each treatment group at the cutpoint. The difference between the predicted outcome values of the treatment and control groups is an estimate of the LATE.
We can use the RDestimate()
function from the rdd
package as follows to fit the local linear regression model for the contribution matching data:
cont_rdd <- RDestimate( formula = contribution ~ size, #outcome regression model data = cont_data, #dataset cutpoint = 300, #cutpoint bw = cont_ik_bw #bandwidth )
The RDestimate()
function fits the local linear regression model at the provided bandwidth, but also at half of the bandwidth and twice the bandwidth. If the estimate of the LATE is relatively the same across bandwidths, we can be more confident that the estimate is accurate. We see all three estimates when we print the results.
Call: RDestimate(formula = contribution ~ size, data = cont_data, cutpoint = 300, bw = cont_ik_bw) Coefficients: LATE Half-BW Double-BW 90.60 110.67 71.62
The model output shows us that the LATE is 90.60, meaning that in this dataset, we can conclude that employer-sponsored retirement contribution matching programs led to an increase in average monthly contributions of $90.60. However, we see that the estimate changes based on the bandwidth, ranging from $110.67 at half of the bandwidth to $71.62 at twice the bandwidth.
Instructions
The air_data
dataset has been loaded for you in notebook.Rmd. The IK optimal bandwidth has been saved as air_ik_bw
. Use the RDestimate()
function to fit the local linear regression model using air_ik_bw
as the bandwidth. Save the results to air_rdd
.
Print the results of the local linear regression. Take note of the estimates of the LATE at the different bandwidths. Are the estimates different or fairly similar? What do you think this says about the reliability of our estimate?