Value-Based Care

The Data Lake Approach to Population Health Management

Share this post:

Anil Jain is Vice President and Chief Medical Officer, IBM Watson Health

As healthcare organizations form clinically integrated networks (CINs) to help them facilitate population health management (PHM), many are finding they lack the data infrastructure needed to ensure their eventual success.

The problem: The various providers within those CINs use electronic health record (EHR) systems that not only have challenges with true interoperability (trouble exchanging information with one another without losing meaning), but also can’t handle the sheer volume and variety of data needed for effective PHM. To get around those limitations, most organizations have embarked on enterprise data warehouse (EDW) strategies. But these solutions may prove inadequate, especially when it comes to addressing the requirements of PHM. Because an EDW requires highly structure data in pre-designated formats to optimize standard reporting, the ad hoc analytics required for PHM become cumbersome making it difficult to gain insights quickly and efficiently-or at the scale that is needed to improve to care.  Much of the data critical for PHM is not structured nor typically available in EHRs.

Turning Disparate Data into Actionable Information

So how can CINs moving into PHM better manage and utilize the data they produce? Instead of storing their information in a traditional EDW, which is far too confining to get the job done, they can take a “data lake” approach to securing their infrastructure, and then use parallel computing to put that data to work. The data lake framework, which maintains raw data in its original format, essentially creates a pool of information that can be dipped into and queried at any time, i.e., in an ad hoc manner. In a healthcare system that is focused on PHM, this flexibility can help create huge advantages, helping  providers, for instance, to quickly turn data from disparate sources into actionable information that can help them make decisions.  As new processes are implemented, the impact of each intervention can quickly be assessed.

Other key benefits of the data lake approach:

It affords a comprehensive view of patient care.
The data lake approach allows healthcare organizations to aggregate and standardize a wide variety of data, including clinical information, data from EHRs, claims data, and patient-generated information. All of that data can then be combined on an as-needed basis to help create a single longitudinal view of the person, supporting a variety of use cases including medical decision-making, cost reduction programs, and quality improvement initiatives, etc.

It permits scalability.
Unlike a traditional EDW approach, the data lake architecture can be scaled to meet demand by simply adding computer clusters to the framework rather than investing in larger database technologies. In PHM, where the data requirements are so significant and likely to expand over time through growth and consolidation, such scalability is critical to success.

It’s very flexible.
Unlike traditional data warehouses, which bind collected data to highly structured and rigid business rules in advance, data lakes store information in its native format while tagging in a way that makes it retrievable for specific reports or analyses. This can help enable authorized individuals to quickly and easily generate ad hoc reports without recreating database structures to accommodate their queries.  This flexibility also aligns with the notion that PHM is evolving and the meaning behind the data should remain flexible.

It’s highly adaptable.
Data lakes can accommodate new types of data as the needs of an organization evolve over time. For healthcare systems that are constantly adapting to new models of reimbursement, or adjusting care delivery to fit the needs of PHM, this is a game-changer: If done with the future in mind, the investment you make in your data infrastructure should pay off for years to come.

Interested in learning more about the data lake approach? Download the Watson Health white paper Data Infrastructure for Managing Population Health.  This topic is also covered in the second edition of Provider-Led Population Health Management.

Vice President and Chief Health Informatics Officer, IBM Watson Health

More Value-Based Care stories

Testing Blockchain Technology for Clinical Trials in Canada

Written by Uli Brödl | Blog Post, Healthcare Data Analytics

By: Dr. Uli Brödl, Vice President, Medical and Regulatory Affairs, Boehringer Ingelheim (Canada) Ltd. The healthcare industry is undergoing significant changes due to the vast amounts of disparate data being generated. There is a huge need for transformative healthcare solutions where healthcare researchers, providers, and patients have access to a 360-degree view of health data. ...read more


Examining the variation in cancer care

Written by Watson Health | AI, Blog Post, Oncology & Genomics

As medicine moves forward—making advances in prevention, detection, care delivery and more—cancer marches on, killing nearly 2 million Europeans every year. As one would expect, the reasons are as complex as cancer itself and one of the most significant reasons is variability in care. Here, we explore some of the reasons behind the variation and how AI may be able to help. ...read more


Telehealth emerges as critical weapon for public health officials combating the opioid epidemic

Written by Peggy O'Brien | Blog Post, Child Welfare, Healthcare Data Analytics...

This past October, the U.S. passed a package of bills focused on confronting the nation’s opioid epidemic. Collectively known as the SUPPORT for Patients and Communities Act[1], the legislation contains a number of legal and regulatory tweaks designed to make addiction treatment more accessible. Among them is a new focus on telehealth, which will make ...read more