Let’s say we have some data on student wages at the state level in a dataset called wages. It has the following variables:

  • state: state where the universities are located
  • year: year the data is from
  • avg_wage: average student wage for all public universities in the state

Start by using the package ggplot2 in R to re-create the student wage line plot from the last exercise. Since we only want to plot California schools, filter by the state variable and then add code for our plot.

# import libraries library(dplyr) library(ggplot2) # plot wages versus years ca_wages <- wages %>% #only California schools filter(state == "California") %>% #wages over time ggplot(aes(x = year, y = avg_wage)) + #line plot geom_line()

The situation created by the law is a natural experiment. Rather than having a researcher randomly assign treatment and control groups to study minimum wage effects, treatment assignment is decided by some outside force. In this case, that outside force was the minimum wage law that went into effect in 2017.

Let’s add a dashed vertical line to our plot to separate the time before and after the law went into effect. We’ll also label the x-axis scale to see the years more clearly.

ca_wages + geom_vline(xintercept = 2016, linetype = "dashed") + scale_x_continuous(breaks = c(2007:2017))

CA wage plot

We can see that there is some change after the law is implemented in 2016, but we don’t know if the change is due to the law or because of other conditions that happened at the same time. We need some data to use as a counterfactual to this situation: what student wages in California would have looked like if the law never happened.



Let’s say there was a new entertainment tax in Sydney starting in 2019. You want to find out if the tax affected movie theater ticket sales. You have data about average annual movie theater ticket sales in Sydney from 2012 through 2019 with the following variables:

  • city: city where the universities are located
  • year: year the data is from
  • sales: average ticket sales for theaters in the city

This data is contained in the dataset tickets which has been loaded for you in notebook.Rmd with the first few rows printed for you in the workspace.

Make a line plot that shows average movie tickets for Sydney by year. Remember to filter city to look at just Sydney. Save the plot as syd_sales.


Add to syd_sales a dashed vertical line at x=2018 and x-axis scale labels for the years 2012 to 2019. What happened to ticket sales in the year the tax was implemented?

Take this course for free

Mini Info Outline Icon
By signing up for Codecademy, you agree to Codecademy's Terms of Service & Privacy Policy.

Or sign up using:

Already have an account?