While we were able to estimate the ATT through mean differences alone, we can also use linear regression for DID. A simple DID regression model predicts the outcome from the variables for treatment group and time, along with the interaction of treatment with time.

To simplify our output, we will first transform the `state`

variable to a treatment indicator called `treat`

(1 for California and 0 for Washington). Then we transform `year`

to a time indicator called `time`

(1 for 2017 and 0 for 2016).

# transform state to treat wages2$treat <- ifelse(wages2$state=="California",1,0) # transform year to time wages2$time <- ifelse(wages2$year==2017,1,0)

We can print and inspect our dataset to make sure our transformations were done correctly.

state treat year time avg_wage 1 California 1 2016 0 13.311279 2 California 1 2017 1 16.000000 3 Washington 0 2016 0 9.728146 4 Washington 0 2017 1 10.000000

To create a DID regression for our student wage data, we run a model that predicts average student wages from the treatment, time, and the interaction of treatment and time. Note that in R, `treat*time`

is equivalent to `treat + time + treat:time`

.

did_mod <- lm( #include interaction avg_wage ~ treat*year, #use subsetted data data = wages2 )

When we print `did_mod`

we get the following output. Note that the coefficient on the interaction term `treat:year`

is exactly what we computed for the ATT by taking the difference of means. We estimate the impact of the minimum wage law on California student wages to be an increase of about $2.42.

Call: lm(formula = avg_wage ~ treat * time, data = wages2) Coefficients: (Intercept) treat time treat:time 9.7281 3.5831 0.2719 2.4169

What do all the other coefficients represent?

`Intercept`

indicates the expected value for pre-treatment average student wages for the control group (Washington 2016).`treat`

is the difference between the control group and the treatment group at the pre-treatment time (California 2016 - Washington 2016).`time`

is the difference between the pre-treatment and post-treatment times for the control group (Washington 2017 - Washington 2016).

Combinations of these coefficients give us back all four means from our dataset.

Mean | Equivalent Coefficient(s) |
---|---|

Washington 2016 | Intercept |

California 2016 | Intercept + treat |

Washington 2017 | Intercept + time |

California 2017 | Intercept + treat + time + treat:time |

### Instructions

**1.**

The dataset `tickets2`

with only 2018 and 2019 data has been saved for you in **notebook.Rmd**. Use the `ifelse()`

function to add to `tickets2`

the indicator variable `treat`

that is 1 for Sydney and 0 for Toronto.

**2.**

Use the `ifelse()`

function to add to `tickets2`

the indicator variable `time`

that is 1 for 2019 and 0 for 2018.

**3.**

Create a linear regression model `did_reg`

that predicts average tickets sales from `treat`

, `time`

, and the interaction of `treat`

with `time`

.

**4.**

Now print `did_reg`

and inspect the coefficient on the interaction term. was the direction of the impact what you expected based on your earlier plots?