One of the main assumptions in causal inference is known as the assumption of conditional exchangeability. This assumption states that, so long as we account for confounders (the non-treatment, non-outcome variables), we would observe the same outcomes if the treatment and non-treatment groups were swapped. Conditional exchangeability is achieved via randomization since it balances both observed AND unobserved variables between treatment groups. However, conditional exchangeability can be difficult to achieve in non-randomized situations.

Propensity score methods are widely used in causal inference because they can help reach conditional exchangeability even when randomization is not possible. So what are propensity scores, and how can we apply propensity score methods to our own questions?

A propensity score is essentially the probability of being in a particular treatment group given a set of observed variables. Typically we will think of propensity scores as the probability of being in the treatment group as opposed to the control group. In a sense, propensity scores summarize all the traits of an observation to a single score, which can be an advantage when there are lots of observed variables.

Propensity score analyses can be broken down into five ordered steps:

  1. Check initial overlap and balance.
  2. Model propensity scores.
  3. Use propensity scores to weight the dataset.
  4. Re-check overlap and balance.
  5. Estimate the treatment effect, or return to step two to improve the propensity score model.


Take a look at the flowchart in the learning environment. This flowchart describes the five general steps that we can use as a template for applying propensity score methods in analysis. Keep this process in mind as we move through the rest of this lesson.

Take this course for free

Mini Info Outline Icon
By signing up for Codecademy, you agree to Codecademy's Terms of Service & Privacy Policy.

Or sign up using:

Already have an account?