Log in from a computer to take this course

You'll need to log in from a computer to start Learn the Basics of Causal Inference with R. But you can practice or keep up your coding streak with the Codecademy Go app. Download the app to get started.

apple storegoogle store

A scatter plot of our data allows us to check certain RDD conditions visually. We can see whether we have a sharp or fuzzy cutoff. We can also use the plot to check for a discontinuity — a sudden change in the outcome variable — at the cutoff.

To create this scatter plot with the contribution matching dataset, we can use the ggplot2 package in R:

library(ggplot2) #load ggplot2 package

Our scatter plot should have the number of employees on the x-axis and the contribution amount on the y-axis. The points for treatment and control groups should be different colors and shapes, so we can easily tell the two groups apart. Finally, we add code for a dashed vertical line at the cutoff of 300 employees.

# create a scatterplot with treatment groups ggplot( data = cont_data, aes( x = size, # forcing variable y = contribution, # outcome variable color = group, # sets point color by treatment group shape = group # sets point shape by treatment group )) + geom_point() + geom_vline(xintercept = 300, linetype = "dashed") #add line at 300

This plot clearly shows that we have a sharp RDD, not a fuzzy one. The dashed line at 300 employees separates the two groups into companies that offer a matching program (at least 300) and companies that do NOT offer a matching program (fewer than 300).

Scatter plot of contributions against company size with cutoff of 300 separating "no program" group from "program" group.

We can also check to make sure that there is actually a discontinuity in average contributions based on whether or not companies have more than 300 employees. To do this, we can add a separate best fit line for each treatment group using the geom_smooth() function. If we had saved our first plot as rdd_scatter, we can add the code for the lines as follows:

# add best fit lines for each group to scatter plot rdd_scatter + geom_smooth( aes(group = group) #plot separate lines for each group method = "lm" #use linear regression )

The same scatter plot of contributions against company size with added regression lines for each program group that shows a jump in contribution amount at the cutoff line of 300.

There is an obvious jump in the average contributions at the cutoff point, which means there is a discontinuity present. If there were no discontinuity present, we might see something like this:

A scatter plot of contributions against company size with added regression lines for each program group that connect like a single line rather than jumping at the cutoff line.

Note that there is no jump in the outcome variable here. The lines connect smoothly.



A new policy requires that power plants that have an output of 600 megawatts or more have to install an emissions control device that removes harmful chemicals before releasing exhaust into the air. An environmental group decides to measure ambient air quality one mile away from each power plant to assess whether the emissions control device leads to better air quality.

The air_data dataset has been loaded for you in notebook.Rmd with the following variables:

  • id: power plant ID.
  • watts: power plant output (megawatts)
  • group: treatment group formed by cutpoint at 600 watts (control = “No Device”, treatment = “Emissions Device”)
  • aqi: air quality index from 0 (good air quality) to 500 (poor air quality)

Create a scatter plot named air_scatter:

  • Use watts as the forcing variable and aqi as the outcome variable.
  • Use different colors and shapes for each treatment group.
  • Add a dashed vertical line at 600 megawatts.

Print air_scatter to view the plot. Consider whether this plot shows a sharp or fuzzy cutoff at 600 megawatts.


Add a line of code to air_scatter to add best-fit lines for each treatment group to the plot. Save the updated plot to air_scatter2.


Print air_scatter2. Do the regression lines appear to show a discontinuity in the outcome variable at the cutoff?

Take this course for free

Mini Info Outline Icon
By signing up for Codecademy, you agree to Codecademy's Terms of Service & Privacy Policy.

Or sign up using:

Already have an account?