Chances are, you or someone you know is superstitious to some extent. Whether it’s wearing a lucky t-shirt to a sporting event or using a favorite pencil and eraser on exams, we believe in superstitions because we think our actions will lead to some desired—but usually unrelated—result. Superstitions are, in fact, extreme examples of assuming an associational relationship is actually causal in nature.
One of the most important concepts in causal inference is the distinction between association and causation. Let’s formally define these two terms:
Association is a general term to describe a relationship between variables. Association can describe the strength or pattern of a relationship, but it does not explain the mechanism behind the relationship.
One frequently used statistical measure of association is correlation. Correlation is typically used to describe the association between two variables with a linear pattern. The animation below shows what variables with different degrees of correlation look like.
- Causation describes not only the strength or pattern of a relationship but also the MECHANISM of a relationship. In a causal relationship, variable X CAUSES a change in variable Y; we know that X must happen before Y.
Take a look at the plot in the learning environment. This plot shows the relationship between monthly swimming pool sales and monthly forest fires. It might seem like the plot suggests that swimming pool sales cause more forest fires to occur! Obviously, our intuition tells us that, no, swimming pools do not CAUSE forest fires. A more sensible explanation for this relationship is that swimming pool sales and forest fires both peak in the hot, dry, summer months.
This is a silly but effective illustration of the adage “correlation is not causation.” Keep this example in your mind as you move throughout this lesson to help you think more critically about what factors might be at play in causal relationships.