The first step in a propensity score analysis is to check how similar the treatment and control groups are at baseline, before using propensity score methods. There are two measures that are commonly used to describe the degree of similarity between treatment groups: overlap and balance.
- Overlap is the range of values of a variable that the treatment and control groups have in common.
- Overlap can also be thought of as the range of values of a variable where the probability of being in the treatment group is greater than 0 but less than 1.
- We already know that overlap is an important assumption of causal inference!
- Balance describes how similar the treatment and control groups are with respect to the entire distribution of each of the other variables.
Balance is expressed as a statistic that summarizes the entire distribution of a variable. Two statistics are commonly used to measure balance.
- Standardized mean difference (SMD). The SMD of a variable in a sample is defined as the difference in the average value of the variable between groups divided by the standard deviation of the variable in both groups.
- Variance ratio. The variance ratio of a variable in a sample is the variance of the variable in one treatment group divided by the variance of the variable in the other treatment group.
So how is “good” or “bad” balance defined?
- An SMD close to zero indicates good balance. This means the average value (and thus the center of the distribution) of the variable is similar between the treatment and control groups.
- A variance ratio close to one is another indicator of good balance. This means that the variability, or spread, of the variable is the same in both groups.
The interactive visualization to the right illustrates how the overlap and balance of a variable are related to the characteristics of a variable’s distribution. Adjust the means and standard deviations of the distributions to see how their shapes change. Then note how the adjustments cause the distributions to cover more or less of one another.
- Try making the mean 40 for both distributions. How do you think having the same mean affects the SMD value?
- Keeping both means at 40, try making one standard deviation 2 and the other 10. The standard deviation is the square root of the variance. How do you think changes in the standard deviation will impact the variance ratio?