Learning and Characterizing Causal Graphs with Latent Variables

Share this post:

Understanding the casual relationships in a system using data is challenging. One way to make it easier is by using a causal graph that summarizes how variables that represent the factors of variation causally relate to each other.

For example, in a complex network of genes, the answer to the question “What happens if I knock out this gene?” depends on how the genes causally interact – meaning if they causally inhibit or express other genes. In addition, to evaluate fairness, it is important to have a causal graph of relationships from data [2].

Generally, unobserved factors of variations that are represented by latent variables induce spurious correlations, making it harder to learn the underlying causal graph. There are many existing methods to build a causal graph from observational and interventional data [3], but only a few that deal with data that has latent variables [4].

The question is, how is it possible to represent the class of causal graphs that are indistinguishable from one another, given both observational and interventional data? This is known as an equivalence class of causal graphs, and the aim is, typically, to extract it from data by learning a concise representation of the set of causal graphs in the equivalence class.

Researchers from the MIT-IBM Watson AI Lab and IBM Research, with collaborators from Columbia University and Purdue University, have done just that. In a new paper, “Characterization and Learning of Causal Graphs with Latent Variables from Soft Interventions,” accepted to the 2019 Conference on Neural Information Processing Systems (NeurIPS), we have developed a new approach to characterize the set of plausible causal graphs from observational and interventional data that has latent variables.

The method deals specifically with data collected under soft interventions – a type of mechanism change where the causal mechanisms generating the intervened variables are not completely destroyed by the intervention, but only perturbed. This perturbation induces a different mechanism to the one observed. This way, a soft intervention preserves the set of causal parents. It is prevalent in biology and medical applications, where drugs perturb only the underlying causal mechanisms, rather than completely force the measured quantities of interest to reach specific values.

The paper’s key finding is that the famous “do-calculus” rules, which Pearl developed in his landmark paper [1], can be used to reverse-engineer the class of possible causal graphs that describe the underlying system. If the interventions are done on the same specific causal graph, the do-calculus rules impose a set of constraints that relate different interventional distributions depending on the graph, which is called the do-constraints. Traditionally, do-calculus rules are deployed in causal inference when a (plausible) causal graph depicting the system that generates the observed data is given as side information, and interventional effects (e.g. average effect of a drug) are estimated from observational data using the causal graph [5].

Given both observational and interventional data, the team used all observable do-constraints that can be tested from this data and used a graphical representation called the augmented graph to capture the information about the underlying unknown causal graph. We characterized all possible augmented graphs that can give rise to observable do-constraints, doing the augmentation by adding an “artificial node” for every pair of interventions. The do-constraints from the data imply “connectivity patterns” (called d-connection statements) amongst the observable variables and these artificial nodes.

To explain in more specific terms, suppose we are given the interventional distribution (Fig. 1) where X is intervened on, shown by p_x, and the observational distribution shown by p. Then we add an artificial node F_x to the causal diagram pointing to X, representing that the intervention changes the mechanism that generates X as shown on the left. The dashed arrows represent the existence of a latent confounder between its endpoints. On the right are the observed distributional implications of the causal diagram – the do-constraints. These artificial nodes are a crucial tool, since graphical separation statements that include artificial nodes can be used to represent the do-constraints, as shown.

causal graphs

Figure 1: Left: Augmented graph for the distributions {p, p_x}. Right: Distributional invariances due to do-constraints and the corresponding graphical separation statements. A _||_ B | C means that A and B are d-separated given C. Crossed _||_ means that they are d-connected. D-connection is a a type of connectivity pattern found in DAGs that imply testable implications in causal learning literature.

Using these artificial nodes to represent interventional targets is not a new idea (for example, it has been used in [3]). However, the team extended the applicability of these artificial nodes by showing that arbitrary graph separation statements on the augmented graph correspond to some combination of testable do-constraints for any given set of interventional distributions when latent variables are involved.

Two causal diagrams are called equivalent if the set of plausible interventional distributions that can be obtained from them result in identical sets of observable do-constraints. Due to the connection between the do-constraints and the separation statements on the augmented graph, if the augmented graphs induce the same graphical separation statements, then they will produce the same set of observed do-constraints.

We show the other direction as well, which establishes a complete characterization for this interventional equivalence class. If the augmented graphs of two causal diagrams do not induce the same graphical separation statements, then there is some pair of interventional distributions consistent with the pair of graphs that induce different do-constraints. The team further used an existing theory of MAGs (Maximal Ancestral Graphs) to characterize when a pair of augmented graphs lie in the same equivalence class using their topological properties.

Under an extended faithfulness assumption, the team of researchers proposed a sound modification to the classical FCI algorithm, which outputs a graphical representation of the equivalence class. It turns out that when the targets of the interventions are known, additional orientation rules are needed in this algorithm than what was previously known. Completeness of this algorithm with possible additional modifications is left open.


[1] J. Pearl, “Causal diagrams for empirical research,” Biometrika 1995.

[2] Razieh Nabi, Ilya Shpitser, “Fair inference on outcomes,” Thirty-Second AAAI Conference on Artificial Intelligence 2018.

[3] Yang, Karren, Abigail Katoff, and Caroline Uhler, “Characterizing and Learning Equivalence Classes of Causal DAGs under Interventions.” International Conference on Machine Learning 2018.

[4] Mooij, Joris M., Sara Magliacane, and Tom Claassen. “Joint causal inference from multiple contexts.” arXiv preprint arXiv:1611.10351 2016.

Research Staff Member

Murat Kocaoglu

Research Staff Member

More AI stories

State-of-the-Art Results in Conversational Telephony Speech Recognition with a Single-Headed Attention-Based Sequence-to-Sequence Model

Powerful neural networks have enabled the use of “end-to-end” speech recognition models that directly map a sequence of acoustic features to a sequence of words. It is generally believed that direct sequence-to-sequence speech recognition models are competitive with traditional hybrid models only when a large amount of training data is used. However, in our recent […]

Continue reading

IBM Research at INTERSPEECH 2020

The 21st INTERSPEECH Conference will take place as a fully virtual conference from October 25 to October 29. INTERSPEECH is the world’s largest conference devoted to speech processing and applications, and is the premiere conference of the International Speech Communication Association. The current focus of speech technology research at IBM Research AI is around Spoken […]

Continue reading

You Have the Floor: IBM Watson AI Weighs In on Bloomberg’s “That’s Debatable”

IBM Research’s Project Debater team worked with the producers of “That’s Debatable” to integrate AI into the traditional debate format. With the help of a new natural language processing (NLP) feature called key point analysis, IBM Watson AI can synthesize thousands of submissions from the general public prior to each episode via

Continue reading