Imagine that a California state law was passed in 2016 that raised the minimum wage beginning in the year 2017. We are interested in what this law’s impact has been on student wages since student jobs often pay low wages.
We have the data on the average student wages at all California public universities from the year 2007 to 2017, displayed in the plot in the learning environment. We note that wages rise and fall from year to year, but we see a particularly large rise from 2016 to 2017.
It’s easy to assume the large increase was entirely due to the new minimum wage law. But what would average student wages in 2017 have looked like had the law NOT been passed?
- Would wages still have increased a lot, meaning the law had little impact?
- Would wages have decreased, meaning the law had an even larger impact than just the difference between 2016 and 2017 wages?
The fundamental principle of causal inference tells us that we can’t observe both situations, so we need to use other observed data as a substitute for what student wages in California would have looked like in 2017 had the law not passed.
Difference in differences (DID) is a causal inference technique that estimates a treatment effect by analyzing treatment and control trends over time. In the following exercises, we will walk through how to apply DID to our student wage example, review the different assumptions and limitations of the DID method, and learn how to perform this method using R programming.