Dark Matter Matters: AI Makes DNA Dark Matter Useful

Share this post:

What is the minimal description that captures a space? Asking a mathematician’s basic question of a  biological dataset reveals interesting answers about biology itself. This summarizes our underlying approach to subtyping hematological cancer. Disease subtyping is a central tenet of precision medicine, and is the challenging task of identifying and classifying patients with similar presentations of a complex and intricate disease – which can help guide better and more informed treatment options for a given individual.

Today, a patient’s data can be collected from a multitude of perspectives (modes): genomic/DNA, transcriptomic/RNA, proteomic, histopathologic images, radiographic and other images, electronic medical records that include a plethora of readouts over time, and much more. Given the general state of our understanding of human diseases, more is indeed more, in terms of data modalities.

dark matter

Specializing the AI algorithm (ReVeal) cleanly separates the subtypes, shown in distinct colors (top right as opposed to bottom left). The portions of the DNA used by ReVeal is the dark-matter region shown as black segments on the 22 autosomes.

However, understanding how a certain data can help answer a specific question is an intriguing problem. Because most human diseases are complicated and heterogeneous, using data to accurately subtype a disease can open up a plethora of treatment options in a clinical setting. For example, performing a therapy with strong side effects could be justified if data could be used to predict the likelihood of a patient’s rapid decline without treatment.

Today, IBM Research and the Munich Leukemia Laboratory are publishing new research in PLOS Computational Biology that aims to subtype different hematological (blood) cancers based on omic data – or information surrounding the roles, relationships and actions of various types of molecules that make up the cells of an organism. In this case, we looked specifically at elements of the human genome, including DNA and dark matter DNA. We currently do not know anything at all about 50 percent of the human genome (very conservatively speaking) called the “dark matter” – similar to our very limited understanding of the dark matter of our universe [1].

Since the tumor cells of origin for one type of cancer is the same, it makes the problem of molecular subtyping harder. We took our analysis further by asking the question whether DNA alone (not RNA or proteins) gave adequate information to subtype these closely related cancers.

Our resulting discoveries resulted in two breakthroughs in this space:

  • DNA alone contains enough signal to subtype blood cancers: DNA is considered the blueprint of the organism – it encodes genes and there are regions outside of genes which play direct or indirect roles in turning genes on and off.
  • “Dark matter” DNA plays a much larger role than previously thought in influencing the phenotype of cells/tissues: Our research found that dark matter DNA alone is adequate in subtyping the cancer. This turns on its head the general belief that dark matter is largely outside the functional or any consequential realm, and proves that it deserves more study.

The off-the-shelf AI algorithms that we used for this problem were inadequate, underscoring the importance of domain-specific nuances in the statistical learning process. We designed a stochastic regularization AI model, specifically for DNA data, to address the confounding heterogeneity that exists in these datasets. In fact, this works well even for other phenotypes, including treatment responses (suggesting a molecular basis for those phenotypes).

Using the unique AI models we designed, coined ReVeal, we were able to achieve a 75 percent accuracy rate in identifying blood cancers using either non-dark DNA or dark matter DNA; compared to just a 35 percent accuracy rate achieved with standard AI methods [1].

These results and the models we created lay the groundwork to continue exploring the significance of dark matter DNA further, in blood cancers – and potentially other types of cancers.




IBM Fellow, IBM Research

Torsten Haferlach

MD, Munich Leukemia Laboratory

More Healthcare stories

We’ve moved! The IBM Research blog has a new home

In an effort better integrate the IBM Research blog with the IBM Research web experience, we have migrated to a new landing page:

Continue reading

Pushing the boundaries of human-AI interaction at IUI 2021

At the 2021 virtual edition of the ACM International Conference on Intelligent User Interfaces (IUI), researchers at IBM will present five full papers, two workshop papers, and two demos.

Continue reading

From HPC Consortium’s success to National Strategic Computing Reserve

Founded in March 2020 just as the pandemic’s wave was starting to wash over the world, the Consortium has brought together 43 members with supercomputing resources. Private and public enterprises, academia, government and technology companies, many of whom are typically rivals. “It is simply unprecedented,” said Dario Gil, Senior Vice President and Director of IBM Research, one of the founding organizations. “The outcomes we’ve achieved, the lessons we’ve learned, and the next steps we have to pursue are all the result of the collective efforts of these Consortium’s community.” The next step? Creating the National Strategic Computing Reserve to help the world be better prepared for future global emergencies.

Continue reading