While we were able to estimate the ATT through mean differences alone, we can also use linear regression for DID. A simple DID regression model predicts the outcome from the variables for treatment group and time, along with the interaction of treatment with time.
To simplify our output, we will first transform the state
variable to a treatment indicator called treat
(1 for California and 0 for Washington). Then we transform year
to a time indicator called time
(1 for 2017 and 0 for 2016).
# transform state to treat wages2$treat <- ifelse(wages2$state=="California",1,0) # transform year to time wages2$time <- ifelse(wages2$year==2017,1,0)
We can print and inspect our dataset to make sure our transformations were done correctly.
state treat year time avg_wage 1 California 1 2016 0 13.311279 2 California 1 2017 1 16.000000 3 Washington 0 2016 0 9.728146 4 Washington 0 2017 1 10.000000
To create a DID regression for our student wage data, we run a model that predicts average student wages from the treatment, time, and the interaction of treatment and time. Note that in R, treat*time
is equivalent to treat + time + treat:time
.
did_mod <- lm( #include interaction avg_wage ~ treat*year, #use subsetted data data = wages2 )
When we print did_mod
we get the following output. Note that the coefficient on the interaction term treat:year
is exactly what we computed for the ATT by taking the difference of means. We estimate the impact of the minimum wage law on California student wages to be an increase of about $2.42.
Call: lm(formula = avg_wage ~ treat * time, data = wages2) Coefficients: (Intercept) treat time treat:time 9.7281 3.5831 0.2719 2.4169
What do all the other coefficients represent?
Intercept
indicates the expected value for pre-treatment average student wages for the control group (Washington 2016).treat
is the difference between the control group and the treatment group at the pre-treatment time (California 2016 - Washington 2016).time
is the difference between the pre-treatment and post-treatment times for the control group (Washington 2017 - Washington 2016).
Combinations of these coefficients give us back all four means from our dataset.
Mean | Equivalent Coefficient(s) |
---|---|
Washington 2016 | Intercept |
California 2016 | Intercept + treat |
Washington 2017 | Intercept + time |
California 2017 | Intercept + treat + time + treat:time |
Instructions
The dataset tickets2
with only 2018 and 2019 data has been saved for you in notebook.Rmd. Use the ifelse()
function to add to tickets2
the indicator variable treat
that is 1 for Sydney and 0 for Toronto.
Use the ifelse()
function to add to tickets2
the indicator variable time
that is 1 for 2019 and 0 for 2018.
Create a linear regression model did_reg
that predicts average tickets sales from treat
, time
, and the interaction of treat
with time
.
Now print did_reg
and inspect the coefficient on the interaction term. was the direction of the impact what you expected based on your earlier plots?