Causation: The Most Important Idea You're Probably Getting Wrong
There’s a famous chart that shows a near-perfect correlation between the per capita consumption of mozzarella cheese in the United States and the number of civil engineering doctorates awarded each year. The line traces are almost identical. If you didn’t know better, you might think that eating more cheese somehow causes more people to get engineering PhDs, or vice versa.
Of course, this is absurd. Both happened to rise together during a period of economic growth, driven by completely independent forces. But that’s exactly the point: correlation is easy to find, and our brains are wired to interpret it as causation.
Understanding causality — what actually causes what — is arguably the second most important concept in scientific thinking, right after the scientific method itself.
Why Causation Is So Confusing
Confounders: The Hidden Third Variable
Most spurious correlations share a common structure: two things are correlated not because one causes the other, but because a third variable — a confounder — drives both.
Ice cream sales and drowning deaths are positively correlated. Does ice cream cause drowning? No. Hot summer weather causes both: people buy more ice cream and swim more, and some swimmers drown. Remove the season effect, and the correlation vanishes.
In medicine, this is dangerous. For decades, observational studies showed that people who took vitamins had better health outcomes. This led to a billion-dollar supplement industry. But the confounder was lifestyle: people who take vitamins also tend to eat better, exercise more, and see doctors regularly. When randomized trials were finally done, most vitamins showed no benefit — and some, like high-dose beta-carotene for smokers, actually increased cancer risk.
Reverse Causation
Sometimes the causal arrow points the other way from what you’d expect.
Hospitals are full of sick people. Does being in a hospital make you sick? Of course not — being sick is what sends you to the hospital. But naive data analysis might miss this.
Consider: companies that hire consultants often perform poorly afterwards. Does hiring consultants destroy value? Maybe sometimes — but mostly, companies hire consultants because they’re already struggling. The poor performance was already in motion.
Or: people who carry umbrellas are more likely to get rained on. Does carrying an umbrella attract rain? No — people bring umbrellas when they expect rain.
Reverse causation is particularly insidious in social science. Does poverty cause crime, or does crime cause poverty? The honest answer is probably both, which brings us to the next problem.
Bidirectional Causation and Feedback Loops
Many real-world systems have causal relationships that run in both directions simultaneously. Stress causes poor sleep. Poor sleep increases stress. You cannot cleanly separate cause and effect because the system is a loop.
In economics: does growth cause investment, or does investment cause growth? Both. In health: does depression cause inactivity, or does inactivity cause depression? Both.
Feedback loops are real and important, but they make simple causal claims much harder to make.
Selection Bias
Abraham Wald, a statistician working for the US military during World War II, was asked to figure out where to add extra armor to bomber planes by examining the damage patterns on planes returning from missions. The obvious approach: armor the parts that showed the most bullet holes.
Wald pointed out the error. The planes they were examining were the ones that came back. The holes they saw represented damage that didn’t cause crashes. The missing data — planes that were shot down — were precisely the ones that could tell you where armor was most needed. They should reinforce the spots that were not hit on returning planes.
This is selection bias: your sample doesn’t represent the full picture, so your conclusions are skewed. When we only observe survivors, successes, or returning planes, we systematically miss the most important cases.
How to Actually Determine Causation
Randomized Controlled Trials (RCTs)
The gold standard. You take a group of people (or systems, or components), randomly assign them to treatment or control, apply the intervention to the treatment group, and measure outcomes.
Random assignment is the key. Because treatment is determined by a coin flip rather than by any property of the participant, there is no systematic difference between groups except for the treatment itself. Confounders — observed and unobserved — are neutralized by randomization. If outcomes differ, causation is the most plausible explanation.
This is why RCTs transformed medicine. Before them, medicine was largely a collection of plausible-sounding ideas and confident practitioners. With them, we discovered that many confident treatments (bloodletting, routine hormone replacement therapy, certain cardiac drugs) were useless or harmful, and that some counterintuitive interventions (like certain vaccines and antibiotics) worked dramatically.
The limitation: you can’t always randomize. You can’t randomly assign people to smoke for 20 years, or to be born into poverty, or to experience a war.
Natural Experiments
When randomization is impossible, reality sometimes provides it anyway.
Draft lotteries during the Vietnam War randomly assigned military service. This allowed economists to study the long-term effects of military service on earnings — not confounded by the fact that volunteers might differ systematically from civilians.
Geographic boundaries create natural experiments. Two otherwise-similar towns might have different laws, water fluoridation levels, or access to resources, not because of any systematic difference in the population but because of an arbitrary border. Comparing outcomes across the boundary gives you a rough approximation of an experiment.
Economists call this a regression discontinuity design when the variation happens at a sharp threshold — say, a policy that applies to people born after a certain date. The people just above and just below the cutoff are otherwise similar, so the threshold acts as an accidental randomizer.
This body of work earned a Nobel Prize. In 2021, the Sveriges Riksbank Prize in Economic Sciences was awarded to Guido Imbens (Stanford GSB), Joshua Angrist (MIT), and David Card (UC Berkeley) for their methodological contributions to causal inference using natural experiments. The Nobel committee described it as sparking an “empirical revolution” in economics — a wholesale shift from theorizing about causation to rigorously measuring it.
Imbens and Angrist’s core contribution was formalizing what exactly a natural experiment can tell you. The problem is subtle: an instrument like a draft lottery doesn’t affect everyone the same way. Some people would have enlisted regardless; others would have avoided service regardless. The lottery only shifted behavior for a third group — those who served because they were drafted and wouldn’t have otherwise. Imbens and Angrist showed that instrumental variable methods, when valid, recover the causal effect specifically for this group: people whose treatment status was actually changed by the instrument. They called this the Local Average Treatment Effect (LATE) — “local” because it applies to the compliers, not the full population. This was a precise, honest answer to the question of what you can and cannot claim from observational data.
Imbens also demonstrated the method through his own empirical work. To study whether unearned income changes people’s willingness to work, he and colleagues surveyed lottery players in Massachusetts — where prizes were paid out in annual installments over 20 years rather than as a lump sum. The random variation in prize size meant the amount of unearned income was essentially randomly assigned among winners. The finding: modest windfalls ($15,000/year) didn't significantly reduce labor supply, but large prizes ($80,000/year) did — and recipients saved about 16% of their winnings. A clean causal estimate of how income affects behavior, extracted entirely from a real-world lottery rather than a lab.
What Imbens, Angrist, and Card collectively demonstrated is that you don’t need a controlled experiment to do causal science. You need cleverness about where randomness already exists in the world, and rigor about what that randomness actually identifies.
Instrumental Variables
Sometimes you can find a variable that affects your supposed cause but has no direct path to the outcome — only an indirect path through the cause you care about.
Economists wanted to study whether more schooling increases earnings. But smart, hardworking people get more schooling and earn more — both driven by the same underlying traits (ability, ambition). Schooling and earnings are correlated, but is schooling itself doing the work?
One clever instrument: proximity to a college. Being born near a college makes you more likely to attend college (reduces cost), but presumably doesn’t directly affect your earnings later in life except through the education it enables. By exploiting this variation, researchers could isolate the causal effect of education.
Finding valid instruments is hard, and the debate over whether any given instrument is truly “excluded” (i.e., has no direct effect) is often fierce. But it’s one of the most powerful tools available when experiments aren’t possible.
Difference-in-Differences
Suppose you want to know whether a new minimum wage law raised unemployment. You compare states that passed the law to states that didn’t, before and after the change.
The trick: rather than comparing post-law outcomes directly (which might differ for unrelated reasons), you compare the change in outcomes. If employment fell by 3% in states that raised minimum wage, and by 1% in states that didn’t, the estimated causal effect is -2 percentage points.
This works if you believe the two groups were on similar trajectories before the policy — the “parallel trends” assumption. It doesn’t require the groups to be identical, just that they would have moved together in the absence of the intervention.
Causal Graphs (Directed Acyclic Graphs)
Judea Pearl, the computer scientist who formalized much of modern causal inference, introduced a visual language for reasoning about causation: directed acyclic graphs (DAGs).
In a DAG, you draw arrows representing causal relationships: an arrow from A to B means A causes B. You can use these graphs to figure out which variables you need to control for (to block confounding paths) and which you should not control for (controlling for some variables can actually introduce bias by opening “collider” paths).
The key insight: causation has structure, and that structure can be reasoned about formally. You don’t always need an experiment — sometimes careful reasoning about the causal graph, combined with the right observational data and controls, can identify causal effects.
Applications
Medicine
Medicine is probably where causation matters most viscerally. Get it wrong and you prescribe treatments that harm patients.
The history of medicine is littered with interventions that were adopted based on plausible mechanisms and correlational evidence, then discredited by RCTs. Hormone replacement therapy for postmenopausal women was widely prescribed for decades based on observational evidence showing cardiovascular benefits. When the randomized Women’s Health Initiative trial finally ran, it found the opposite: increased risk of heart disease, stroke, and breast cancer.
Doctors who do autopsies on premature infants observed that almost all of them had a patent ductus arteriosus — an open vessel that normally closes shortly after birth. So doctors began treating it aggressively. Decades of RCTs later, it turns out routine treatment makes outcomes worse, not better. The association was real; the causal interpretation was backwards.
Today, evidence-based medicine insists on hierarchy: systematic reviews of RCTs at the top, expert opinion at the bottom. Not because RCTs are perfect, but because they’re far more reliable than a smart clinician’s intuition shaped by uncontrolled observations.
Engineering
Engineers deal with causation constantly, often under the name “root cause analysis.” When a system fails, the instinct is to ask why — and then to ask why again, and again, until you reach a cause you can actually address.
The “5 Whys” technique, developed at Toyota, formalizes this. Why did the machine stop? The fuse blew. Why did the fuse blow? The bearing overloaded. Why did the bearing overload? There was no lubrication. Why was there no lubrication? The oil pump failed. Why did the oil pump fail? The shaft was worn.
You don’t fix the fuse — that’s just the symptom. You fix the shaft. This is causal thinking in engineering form.
But engineers also know that causal chains can be complex. Bridge collapses rarely have one cause; they result from multiple factors combining: a design flaw, unusual weather, deferred maintenance, and increased load, all at once. The challenge is building systems robust to multiple simultaneous causes, not just the one most recently observed to fail.
Everyday Decision-Making
The practical implication is not that you should never act without an RCT — life doesn’t wait for controlled experiments. It’s that you should be appropriately humble about causal claims based on observation alone.
When you start a new exercise routine and feel better, ask whether it’s the exercise causing the improvement, or whether you also changed your diet, sleep schedule, or have more motivation generally. When a business strategy seems to work, ask whether the strategy caused growth, or whether the market was already moving your way.
The habit to cultivate is asking: what else could explain this? What confounders might exist? Could the causation run the other way? Is there selection bias in what I’m observing?
Sometimes the answer is: no, this really is the most plausible explanation, and the effect is large enough that alternative explanations seem unlikely. That’s fine — certainty isn’t required. But the question itself is the discipline.
Causation is hard because the world is complex, feedback loops are everywhere, and our pattern-matching brains are far better at finding correlations than at tracing true causal pathways. The experimental methods described here — RCTs, natural experiments, instrumental variables — are humanity’s hard-won tools for cutting through that complexity. Learning to ask “but does it actually cause that?” is, alongside logic and statistics, one of the most clarifying habits of thought you can develop.
Comments
Came here from LinkedIn or X? Join the conversation below — all discussion lives here.