Learning and Characterizing Causal Graphs with Latent Variables

Share this post:

Understanding the casual relationships in a system using data is challenging. One way to make it easier is by using a causal graph that summarizes how variables that represent the factors of variation causally relate to each other.

For example, in a complex network of genes, the answer to the question “What happens if I knock out this gene?” depends on how the genes causally interact – meaning if they causally inhibit or express other genes. In addition, to evaluate fairness, it is important to have a causal graph of relationships from data [2].

Generally, unobserved factors of variations that are represented by latent variables induce spurious correlations, making it harder to learn the underlying causal graph. There are many existing methods to build a causal graph from observational and interventional data [3], but only a few that deal with data that has latent variables [4].

The question is, how is it possible to represent the class of causal graphs that are indistinguishable from one another, given both observational and interventional data? This is known as an equivalence class of causal graphs, and the aim is, typically, to extract it from data by learning a concise representation of the set of causal graphs in the equivalence class.

Researchers from the MIT-IBM Watson AI Lab and IBM Research, with collaborators from Columbia University and Purdue University, have done just that. In a new paper, “Characterization and Learning of Causal Graphs with Latent Variables from Soft Interventions,” accepted to the 2019 Conference on Neural Information Processing Systems (NeurIPS), we have developed a new approach to characterize the set of plausible causal graphs from observational and interventional data that has latent variables.

The method deals specifically with data collected under soft interventions – a type of mechanism change where the causal mechanisms generating the intervened variables are not completely destroyed by the intervention, but only perturbed. This perturbation induces a different mechanism to the one observed. This way, a soft intervention preserves the set of causal parents. It is prevalent in biology and medical applications, where drugs perturb only the underlying causal mechanisms, rather than completely force the measured quantities of interest to reach specific values.

The paper’s key finding is that the famous “do-calculus” rules, which Pearl developed in his landmark paper [1], can be used to reverse-engineer the class of possible causal graphs that describe the underlying system. If the interventions are done on the same specific causal graph, the do-calculus rules impose a set of constraints that relate different interventional distributions depending on the graph, which is called the do-constraints. Traditionally, do-calculus rules are deployed in causal inference when a (plausible) causal graph depicting the system that generates the observed data is given as side information, and interventional effects (e.g. average effect of a drug) are estimated from observational data using the causal graph [5].

Given both observational and interventional data, the team used all observable do-constraints that can be tested from this data and used a graphical representation called the augmented graph to capture the information about the underlying unknown causal graph. We characterized all possible augmented graphs that can give rise to observable do-constraints, doing the augmentation by adding an “artificial node” for every pair of interventions. The do-constraints from the data imply “connectivity patterns” (called d-connection statements) amongst the observable variables and these artificial nodes.

To explain in more specific terms, suppose we are given the interventional distribution (Fig. 1) where X is intervened on, shown by p_x, and the observational distribution shown by p. Then we add an artificial node F_x to the causal diagram pointing to X, representing that the intervention changes the mechanism that generates X as shown on the left. The dashed arrows represent the existence of a latent confounder between its endpoints. On the right are the observed distributional implications of the causal diagram – the do-constraints. These artificial nodes are a crucial tool, since graphical separation statements that include artificial nodes can be used to represent the do-constraints, as shown.

causal graphs

Figure 1: Left: Augmented graph for the distributions {p, p_x}. Right: Distributional invariances due to do-constraints and the corresponding graphical separation statements. A _||_ B | C means that A and B are d-separated given C. Crossed _||_ means that they are d-connected. D-connection is a a type of connectivity pattern found in DAGs that imply testable implications in causal learning literature.

Using these artificial nodes to represent interventional targets is not a new idea (for example, it has been used in [3]). However, the team extended the applicability of these artificial nodes by showing that arbitrary graph separation statements on the augmented graph correspond to some combination of testable do-constraints for any given set of interventional distributions when latent variables are involved.

Two causal diagrams are called equivalent if the set of plausible interventional distributions that can be obtained from them result in identical sets of observable do-constraints. Due to the connection between the do-constraints and the separation statements on the augmented graph, if the augmented graphs induce the same graphical separation statements, then they will produce the same set of observed do-constraints.

We show the other direction as well, which establishes a complete characterization for this interventional equivalence class. If the augmented graphs of two causal diagrams do not induce the same graphical separation statements, then there is some pair of interventional distributions consistent with the pair of graphs that induce different do-constraints. The team further used an existing theory of MAGs (Maximal Ancestral Graphs) to characterize when a pair of augmented graphs lie in the same equivalence class using their topological properties.

Under an extended faithfulness assumption, the team of researchers proposed a sound modification to the classical FCI algorithm, which outputs a graphical representation of the equivalence class. It turns out that when the targets of the interventions are known, additional orientation rules are needed in this algorithm than what was previously known. Completeness of this algorithm with possible additional modifications is left open.


[1] J. Pearl, “Causal diagrams for empirical research,” Biometrika 1995.

[2] Razieh Nabi, Ilya Shpitser, “Fair inference on outcomes,” Thirty-Second AAAI Conference on Artificial Intelligence 2018.

[3] Yang, Karren, Abigail Katoff, and Caroline Uhler, “Characterizing and Learning Equivalence Classes of Causal DAGs under Interventions.” International Conference on Machine Learning 2018.

[4] Mooij, Joris M., Sara Magliacane, and Tom Claassen. “Joint causal inference from multiple contexts.” arXiv preprint arXiv:1611.10351 2016.

Research Staff Member

Murat Kocaoglu

Research Staff Member

More AI stories

We’ve moved! The IBM Research blog has a new home

In an effort better integrate the IBM Research blog with the IBM Research web experience, we have migrated to a new landing page:

Continue reading

Pushing the boundaries of human-AI interaction at IUI 2021

At the 2021 virtual edition of the ACM International Conference on Intelligent User Interfaces (IUI), researchers at IBM will present five full papers, two workshop papers, and two demos.

Continue reading

From HPC Consortium’s success to National Strategic Computing Reserve

Founded in March 2020 just as the pandemic’s wave was starting to wash over the world, the Consortium has brought together 43 members with supercomputing resources. Private and public enterprises, academia, government and technology companies, many of whom are typically rivals. “It is simply unprecedented,” said Dario Gil, Senior Vice President and Director of IBM Research, one of the founding organizations. “The outcomes we’ve achieved, the lessons we’ve learned, and the next steps we have to pursue are all the result of the collective efforts of these Consortium’s community.” The next step? Creating the National Strategic Computing Reserve to help the world be better prepared for future global emergencies.

Continue reading