Log in from a computer to take this course

You'll need to log in from a computer to start Learn the Basics of Causal Inference with R. But you can practice or keep up your coding streak with the Codecademy Go app. Download the app to get started.

apple storegoogle store

While we were able to estimate the ATT through mean differences alone, we can also use linear regression for DID. A simple DID regression model predicts the outcome from the variables for treatment group and time, along with the interaction of treatment with time.

To simplify our output, we will first transform the state variable to a treatment indicator called treat (1 for California and 0 for Washington). Then we transform year to a time indicator called time (1 for 2017 and 0 for 2016).

# transform state to treat wages2$treat <- ifelse(wages2$state=="California",1,0) # transform year to time wages2$time <- ifelse(wages2$year==2017,1,0)

We can print and inspect our dataset to make sure our transformations were done correctly.

state treat year time avg_wage 1 California 1 2016 0 13.311279 2 California 1 2017 1 16.000000 3 Washington 0 2016 0 9.728146 4 Washington 0 2017 1 10.000000

To create a DID regression for our student wage data, we run a model that predicts average student wages from the treatment, time, and the interaction of treatment and time. Note that in R, treat*time is equivalent to treat + time + treat:time.

did_mod <- lm( #include interaction avg_wage ~ treat*year, #use subsetted data data = wages2 )

When we print did_mod we get the following output. Note that the coefficient on the interaction term treat:year is exactly what we computed for the ATT by taking the difference of means. We estimate the impact of the minimum wage law on California student wages to be an increase of about $2.42.

Call: lm(formula = avg_wage ~ treat * time, data = wages2) Coefficients: (Intercept) treat time treat:time 9.7281 3.5831 0.2719 2.4169

What do all the other coefficients represent?

  • Intercept indicates the expected value for pre-treatment average student wages for the control group (Washington 2016).
  • treat is the difference between the control group and the treatment group at the pre-treatment time (California 2016 - Washington 2016).
  • time is the difference between the pre-treatment and post-treatment times for the control group (Washington 2017 - Washington 2016).

Combinations of these coefficients give us back all four means from our dataset.

Mean Equivalent Coefficient(s)
Washington 2016 Intercept
California 2016 Intercept + treat
Washington 2017 Intercept + time
California 2017 Intercept + treat + time + treat:time



The dataset tickets2 with only 2018 and 2019 data has been saved for you in notebook.Rmd. Use the ifelse() function to add to tickets2 the indicator variable treat that is 1 for Sydney and 0 for Toronto.


Use the ifelse() function to add to tickets2 the indicator variable time that is 1 for 2019 and 0 for 2018.


Create a linear regression model did_reg that predicts average tickets sales from treat, time, and the interaction of treat with time.


Now print did_reg and inspect the coefficient on the interaction term. was the direction of the impact what you expected based on your earlier plots?

Take this course for free

Mini Info Outline Icon
By signing up for Codecademy, you agree to Codecademy's Terms of Service & Privacy Policy.

Or sign up using:

Already have an account?