What is Causal Inference?

A modern office building, characterized by its reflective glass facade and geometric lines.
Joshua Noble

Data Scientist

What is causal inference?
 

You've probably heard the phrase "correlation is not causation". You should always be wary of correlations that lack significance but you do have to be able to infer causes from correlation. If you couldn't, you wouldn't be able to learn scientific principles or make predictions of any kind.

There are ways that we can infer causation from correlation. Scientific experiments and randomized control trials are some of the most used approaches, but they  need to be carefully engineered to infer causation. 

Causal inference is the process of determining whether one variable causes a change in another variable. Casual inference algorithms have emerged from several different disciplines: epidemiology, public health, econometrics and data science. It’s used widely in healthcare, social sciences and decision-making of all kinds.

The goal of causal inference is to establish that a cause-and-effect relationship exists. The fundamental challenge in causal inference is that you can't directly observe causal effects. In an A/B test, you can show both A/B to a user and have them select. However, when you want to know whether a medication, policy or intervention causes an outcome, you must know what might have happened in the other case. The challenge of doing causal analysis is that you can only observe what happened to each individual.

The core idea of causal inference is discovering counterfactuals—a specific alternative to reality. Alternatives can be a different treatment or no treatment at all, a different decision or a slightly different variable somewhere in a causal chain. For example, if you have a headache and you take an aspirin and then your headache goes away, you want to know: would your headache go away without the aspirin? You need information about something that didn't happen. You already took the aspirin, so you have no way of knowing what would have happened if you didn't.

Randomized controlled trials (RCTs) are the gold standard of causal inference because random assignment can ensure that treatment and control groups are comparable on average. Random assignment helps to ensure that the difference between the treatment and control groups is only affected by the treatment variable itself.

There are other ways of establishing causal relations without an RCT. These methods are typically called observational studies because they involve observed data rather than experimental data. 

One example is a natural experiment, when a real-world factor creates almost random or quasi-random variation in treatment assignment. Natural disasters, lotteries or military drafts are all examples of events that introduce a treatment independent of other factors, similar to the way that randomization allows comparisons in an experiment. For instance, you might compare how wealth affects life expectancy by looking at two people with very similar educational and socio-economic backgrounds where one of them wins the lottery and the other does not. Because lottery wealth doesn’t come from family background, educational attainment or ambition, we can see the effects of an infusion of money. These kinds of study designs are quasi-experimental because they create an experiment but are not strictly controlled by the experimenters. There are a growing number of statistical methods to analyze natural experiments. These methods are vital for thinking clearly about causation and tools for credible causality when experiments aren't possible or ethical.

Another causal inference method when a randomized control trial isn't possible or ethical is to locate units in treated and untreated groups that are similar and compare them. This method can be done with data taken only after the treatment by using a study design like propensity score matching. It can also be done by using time series data and by using a study like difference in differences (DiD). For example, two similar US states with slightly different tax laws can show how those laws affect employment or growth. You might measure employment only after one state passes the tax law or you might measure employment for both before and after the law to see how the law affects the state that passed it.

Potential outcomes
 

Potential outcomes (also referred to as the Rubin causal model) is a fundamental framework in causal inference. It provides a rigorous definition for what a causal effect means for individuals and for different treatment and control groups.

For individuals, potential outcomes represent all the possible outcomes that can occur under all possible treatment conditions. If you're looking at a binary treatment, for instance people with a headache who take aspirin versus people with a headache who don't, then each person has two potential outcomes:

  • Y₁ᵢ: the outcome for person i if they take aspirin
  • Y₀ᵢ: the outcome for person i if they don't take aspirin.

The individual treatment effect for person i would be Y₁ᵢ - Y₀ᵢ. It's important to remember that one of these outcomes is the counterfactual, you can never observe what happens when someone both does and doesn't take aspirin. You can only see the "realized outcome" corresponding to the treatment they received.

Because you can't measure individual effects directly, you instead focus on the average treatment effect (ATE):

E[Y₁ᵢ - Y₀ᵢ] = E[Y₁ᵢ] - E[Y₀ᵢ]

This formula represents the average difference in outcomes if everyone was treated versus if no one was treated.

The potential outcomes framework relies on several assumptions. The first of these is the stable unit treatment value assumption (SUTVA), which requires that one person's treatment doesn't affect another's outcomes and that treatment is consistently defined across units. One person's headache or aspirin shouldn't affect anyone elses and everyone in the treatment group should get the same dosage of aspirin. You also assume ignorability—that treatment assignment is independent of potential outcomes and conditional on observed covariates. That would mean that people aren't getting aspirin based on, for instance, the severity of their headache.

This framework helps clarify what we're trying to estimate and what assumptions are needed. For example, when comparing outcomes between treated and untreated groups, we're implicitly assuming the untreated group's outcomes represent what might have happened to the treated group without treatment.

Confounders

A confounder is a variable that influences both the treatment you're studying and the outcome you're measuring. This variable creates what is called a "back door path" between treatment and outcome that doesn't represent the true causal effect. Without accounting for confounders, you might incorrectly attribute changes in the outcome to the treatment when they're due to these shared causes.

Confounding is when there are common causes of both the treatment and the outcome that create spurious associations between them. For instance, if you're studying whether a new drug reduces heart attacks, age would be a confounding variable. Older patients might be more likely to receive the drug and more likely to have heart attacks regardless of treatment. Without accounting for the age of participants, your study might conclude incorrectly that the drug is ineffective or perhaps even harmful. Accounting for a confounder is often referred to as "controlling" because it can ensure that your study will collect data and include it in the calculations of efficacy.

A confounding factor for which you have data is called a "measured confounder" and these are often easy to deal with. A confounder for which you don't have data is called an "unmeasured confounder." This factor is a significant challenge to experimental design and analysis because they mean that you're not observing important elements of the causal effect you're trying to establish.

With unmeasured confounders, randomization can often break the association between confounders and treatment. Imagine that you're testing a headache medication to which a small number of people are allergic and your data doesn't account for that allergy. If you have enough people in both treatment and control, you can underestimate how well your medication works, but you should be able to see an effect if one exists. Instrumental variables (IV) are another approach to estimating causal effects in the presence of unmeasured confounders. An IV affects treatment assignment but does not affect the outcome except through its influence on treatment. Because the IV is uncorrelated with unmeasured confounders, you can compare outcomes across different values of the instrument to get a clean estimate of the causal effect.

Understanding and addressing confounding is essential for making valid causal inferences from observational data.

Colliders

A collider is another common challenge that you might encounter when inferring causation and measuring effects. A collider is a variable that is causally influenced by two or more other variables—essentially creating a "collision point" where the influence of multiple variables meet. In a causal graph, a collider has arrows pointing into it from multiple sources. A collider looks like this X → Z ← Y, where Z is the collider being influenced by both X and Y. Unlike a confounder that influences multiple variables, a collider receive influence from multiple variables.

Colliders are important because controlling for them can induce spurious associations between their causes—even when those causes are independent. This scenario is called "collider bias". There's an important term in causal inference here called "conditioning", which simply means inadvertently affecting an outcome through who is selected for a study.

A well-known example is that the college admission SAT scores for college admissions don't predict grade point average for students once they're enrolled in school. Thinking about how students are selected for admission into college: they can have very high SAT scores or they have very high school grades or an excellent admissions essay. We have conditioned the study on the collider of being in college, which hides the effect of SAT scores.

Another well-known example is from the COVID-19 pandemic. Smoking seemed to reduce the likelihood of being admitted to the hospital for serious COVID-19. In China, 8% of COVID hospital admissions were smokers, while 26% of the general population smoked. The collider was in the hospital admission itself: smoking made otherwise healthy younger people more likely to be admitted to the hospital with serious complications. Compared to older people or people with underlying health conditions, they were less likely to be hospitalized but compared to their actual peers, they were far more likely to be hospitalized.

Unlike confounders that you should control for, you should not control for colliders. Controlling for colliders opens previously blocked paths in the causal graph and can create misleading associations. This is why understanding the structure of a causal model is crucial—the same statistical technique (controlling for a variable) can either help (with confounders) or harm (with colliders) your understanding of causation.

Directed acyclic graph

A directed acyclic graph (DAG) in causal inference is a visual representation of the causal relationships between variables in a system. These inferences are also sometimes called Bayesian networks and they can be thought of as causal diagrams. It consists of nodes that represent variables connected by directed arrows, representing causal relationships. If we believe that both grams of ice cream eaten per week and hours spent exercising per week affect our body weight, then our DAG might look like this:

 

A graph showing a representation of ice cream and exercise causing changes in weight A DAG showing that changes in weight are caused by ice cream and exercises

DAGs show your assumptions about how variables causally relate to each other. Each arrow represents a direct causal effect from one variable to another. The absence of an arrow indicates that you assume there's no direct causal relationship. DAGs also guide which variables you need to control for in your analysis. The back door criterion helps determine the minimal set of variables needed to block all confounding paths between treatment and outcome.

DAGs can also help to reveal collider variables in a series of causal relationships. As we've mentioned, conditioning on colliders can introduce bias, so identifying them is crucial. In the example of smoking and hospitalization for COVID, researchers thought they were identifying the relationship on the left, but were in fact observing the relationships on the right:

Two graphs showing the incorrect causal graph for a Covid and smoking study, and the correct graph with the collider The incorrect DAG on the left and the correct DAG on the right, which shows the collider

Here, hospitalization is a collider because the researchers selected their population based on hospitalization, meaning that they couldn't see how both overall health and smoking affected the outcome.

Causal inference and machine learning

While causal inference has traditionally relied on statistical techniques to find and measure causal relationships, new areas of research leverage the power of machine learning and artificial intelligence

Machine learning is now being applied in estimating treatment effects for causal relationships and is especially valuable when there are multiple groups being treated that might respond in slightly different ways. One of the approaches that has shown promise is to use Bayesian additive regression trees (BART) for causal inference. These Bayesian models are flexible and nonparametric and use an ensemble of regression trees to model complex relationships between predictors and outcomes.

Meta-learning is another area of active research that uses representation learning to balance treatment and control groups. When treatment and control groups differ systematically in their characteristics, it can be difficult to determine whether outcomes are from a treatment itself or from differences between groups. This problem becomes acute when needing to understand how treatment effects differ across subgroups. For instance, a new drug might help older patients recover from an illness but have no effect at all on younger patients. Younger patients who take the drug and recover aren’t recovering because of the drug but instead because they are young.

Causal discovery is learning causal relationships and structures from observational data gathered from nonexperimental contexts. The goal is to uncover and understand the underlying causal mechanisms that generate data. This field is also being revolutionized by a variety of machine learning approaches. Researchers have begun by using deep learning and neural networks for automated causal discovery because they can operate on nonlinear causal relationships and high-dimensional data. Most approaches fit relationships in data to a DAG by testing whether variables in the data are conditionally dependent on one another. This route potentially allows for causal inference without either experimentation or specific domain knowledge.

Related solutions
IBM SPSS Statistics

Analyze data with ease, forecast trends accurately, and drive key outcomes using IBM SPSS Statistics.

Discover SPSS Statistics
Analytics tools and solutions

To thrive, companies must use data to build customer loyalty, automate business processes and innovate with AI-driven solutions.

Explore analytics solutions
Data and analytics consulting services

Unlock the value of enterprise data with IBM Consulting, building an insight-driven organization that delivers business advantage.

Explore analytics services
Take the next step

Discover how the intuitive and easy-to-use interface of SPSS Statistics can help you quickly extract actionable insights for data-driven decisions.

Explore SPSS Statistics Try it free