A scatter plot of our data allows us to check certain RDD conditions visually. We can see whether we have a sharp or fuzzy cutoff. We can also use the plot to check for a discontinuity — a sudden change in the outcome variable — at the cutoff.
To create this scatter plot with the contribution matching dataset, we can use the
ggplot2 package in R:
library(ggplot2) #load ggplot2 package
Our scatter plot should have the number of employees on the x-axis and the contribution amount on the y-axis. The points for treatment and control groups should be different colors and shapes, so we can easily tell the two groups apart. Finally, we add code for a dashed vertical line at the cutoff of 300 employees.
# create a scatterplot with treatment groups ggplot( data = cont_data, aes( x = size, # forcing variable y = contribution, # outcome variable color = group, # sets point color by treatment group shape = group # sets point shape by treatment group )) + geom_point() + geom_vline(xintercept = 300, linetype = "dashed") #add line at 300
This plot clearly shows that we have a sharp RDD, not a fuzzy one. The dashed line at 300 employees separates the two groups into companies that offer a matching program (at least 300) and companies that do NOT offer a matching program (fewer than 300).
We can also check to make sure that there is actually a discontinuity in average contributions based on whether or not companies have more than 300 employees. To do this, we can add a separate best fit line for each treatment group using the
geom_smooth() function. If we had saved our first plot as
rdd_scatter, we can add the code for the lines as follows:
# add best fit lines for each group to scatter plot rdd_scatter + geom_smooth( aes(group = group), #plot separate lines for each group method = "lm" #use linear regression )
There is an obvious jump in the average contributions at the cutoff point, which means there is a discontinuity present. If there were no discontinuity present, we might see something like this:
Note that there is no jump in the outcome variable here. The lines connect smoothly.
A new policy requires that power plants that have an output of 600 megawatts or more have to install an emissions control device that removes harmful chemicals before releasing exhaust into the air. An environmental group decides to measure ambient air quality one mile away from each power plant to assess whether the emissions control device leads to better air quality.
air_data dataset has been loaded for you in notebook.Rmd with the following variables:
id: power plant ID.
watts: power plant output (megawatts)
group: treatment group formed by cutpoint at 600 watts (control = “No Device”, treatment = “Emissions Device”)
aqi: air quality index from 0 (good air quality) to 500 (poor air quality)
Create a scatter plot named
wattsas the forcing variable and
aqias the outcome variable.
- Use different colors and shapes for each treatment group.
- Add a dashed vertical line at 600 megawatts.
air_scatter to view the plot. Consider whether this plot shows a sharp or fuzzy cutoff at 600 megawatts.
Add a line of code to
air_scatter to add best-fit lines for each treatment group to the plot. Save the updated plot to
air_scatter2. Do the regression lines appear to show a discontinuity in the outcome variable at the cutoff?