You've probably heard the phrase "correlation is not causation". You should always be wary of correlations that lack significance but you do have to be able to infer causes from correlation. If you couldn't, you wouldn't be able to learn scientific principles or make predictions of any kind.



There are ways that we can infer causation from correlation. Scientific experiments and randomized control trials are some of the most used approaches, but they need to be carefully engineered to infer causation.



Causal inference is the process of determining whether one variable causes a change in another variable. Casual inference algorithms have emerged from several different disciplines: epidemiology, public health, econometrics and data science. It’s used widely in healthcare, social sciences and decision-making of all kinds.



The goal of causal inference is to establish that a cause-and-effect relationship exists. The fundamental challenge in causal inference is that you can't directly observe causal effects. In an A/B test, you can show both A/B to a user and have them select. However, when you want to know whether a medication, policy or intervention causes an outcome, you must know what might have happened in the other case. The challenge of doing causal analysis is that you can only observe what happened to each individual.



The core idea of causal inference is discovering counterfactuals—a specific alternative to reality. Alternatives can be a different treatment or no treatment at all, a different decision or a slightly different variable somewhere in a causal chain. For example, if you have a headache and you take an aspirin and then your headache goes away, you want to know: would your headache go away without the aspirin? You need information about something that didn't happen. You already took the aspirin, so you have no way of knowing what would have happened if you didn't.



Randomized controlled trials (RCTs) are the gold standard of causal inference because random assignment can ensure that treatment and control groups are comparable on average. Random assignment helps to ensure that the difference between the treatment and control groups is only affected by the treatment variable itself.



There are other ways of establishing causal relations without an RCT. These methods are typically called observational studies because they involve observed data rather than experimental data.



One example is a natural experiment, when a real-world factor creates almost random or quasi-random variation in treatment assignment. Natural disasters, lotteries or military drafts are all examples of events that introduce a treatment independent of other factors, similar to the way that randomization allows comparisons in an experiment. For instance, you might compare how wealth affects life expectancy by looking at two people with very similar educational and socio-economic backgrounds where one of them wins the lottery and the other does not. Because lottery wealth doesn’t come from family background, educational attainment or ambition, we can see the effects of an infusion of money. These kinds of study designs are quasi-experimental because they create an experiment but are not strictly controlled by the experimenters. There are a growing number of statistical methods to analyze natural experiments. These methods are vital for thinking clearly about causation and tools for credible causality when experiments aren't possible or ethical.



Another causal inference method when a randomized control trial isn't possible or ethical is to locate units in treated and untreated groups that are similar and compare them. This method can be done with data taken only after the treatment by using a study design like propensity score matching. It can also be done by using time series data and by using a study like difference in differences (DiD). For example, two similar US states with slightly different tax laws can show how those laws affect employment or growth. You might measure employment only after one state passes the tax law or you might measure employment for both before and after the law to see how the law affects the state that passed it.

