Learn

In RDD, we know we need to look at points near the cutoff to find treatment and control groups that are similar. But how do we know how close to look?

The bandwidth describes the distance on either side of the cutoff we should use to reduce our dataset. Any points that are more than one bandwidth above or below the cutoff are discarded. Choosing the bandwidth can have a serious impact on the results of an RDD analysis:

  • A wider bandwidth keeps more of the original dataset, so we have more information to estimate the treatment effect with. However, the treatment groups might be too different on confounding variables, which could decrease accuracy.
  • A narrower bandwidth retains less of the original dataset, so treatment groups will be more alike. However, the smaller sample size means less information to estimate the treatment effect.

We could select the bandwidth based on what we BELIEVE is best. However, an algorithm that optimizes the bandwidth mathematically may be a better choice. A popular choice—which we will use—is the Imbens-Kalyanaraman (IK) algorithm.

The R package rdd contains all of the tools needed to calculate the optimal bandwidth and carry out an RDD analysis. To calculate the IK bandwidth using rdd, we will use the IKbandwidth() function, which requires three arguments:

  • X: the forcing variable
  • Y: the outcome variable
  • cutpoint: the cutoff value to use.

To calculate the IK bandwidth for the contribution matching dataset, we would use the following code:

library(rdd) # calculate IK bandwidth cont_ik_bw <- IKbandwidth( X = cont_data$size, # forcing variable Y = cont_data$contribution, # outcome variable cutpoint = cont_cutpoint # cutpoint ) # print the IK bandwidth to the console cont_ik_bw [1] 13.26322

The reduced dataset used in our RDD analysis will include only the companies that have between 286 and 314 employees (300 ± 13.26). Companies with between 286 and 314 employees are likely to be similar on other variables that may impact employee contributions, such as average salary or insurance costs.

To illustrate the bandwidth visually, we can add bandwidth lines to the scatterplot. We can use geom_vline() to add reference lines at the cutpoint ± the bandwidth to our scatter plot rdd_scatter from earlier:

rdd_scatter + geom_vline(xintercept = 300 + c(-cont_ik_bw, cont_ik_bw)) # add lines to indicate the bandwidth

A scatter plot of contributions against company size with added vertical lines around 287 and 313 to show the slice of the data that is relevant to our study.

This plot shows us just how narrow the optimal bandwidth is for the contribution program dataset.

Instructions

1.

The dataset air_data has been loaded for you in notebook.Rmd. Calculate the Imbens-Kalyanaraman (IK) optimal bandwidth using the IKbandwidth() function. Save the results to air_ik_bw.

2.

Print air_ik_bw. Think about whether the bandwidth seems large or small in relation to the scale of the forcing variable.

3.

A scatter plot of AQI (aqi) against power plant output (watts) with a dashed line at 600 megawatts has been created for you and saved as air_scatter. Modify air_scatter to add solid vertical lines for the bandwidth cutoff lines. Save the result to air_scatter2.

4.

Print air_scatter2. Do you think the two device groups within this bandwidth will be similar enough to compare? What might be the tradeoff if the bandwidth is too narrow?

Take this course for free

Mini Info Outline Icon
By signing up for Codecademy, you agree to Codecademy's Terms of Service & Privacy Policy.

Or sign up using:

Already have an account?