Healthcare Data Analytics

The 5 Vs of Big Data

Share this post:

Anil Jain, MD, is a Vice President and Chief Medical Officer at IBM Watson Health

I recently spoke with Mark Masselli and Margaret Flinter for an episode of their “Conversations on Health Care” radio show, explaining how IBM Watson’s Explorys platform leveraged the power of advanced processing and analytics to turn data from disparate sources into actionable information. My hosts wanted to know what this data actually looks like. And how, they wondered, are the characteristics of big data relevant to healthcare organizations in particular?

As it turns out, data scientists almost always describe “big data” as having at least three distinct dimensions: volume, velocity, and variety. Some then go on to add more Vs to the list, to also include—in my case—variability and value. Here’s how I define the “five Vs of big data”, and what I told Mark and Margaret about their impact on patient care.

Volume: Big data first and foremost has to be “big,” and size in this case is measured as volume. From clinical data associated with lab tests and physician visits, to the administrative data surrounding payments and payers, this well of information is already expanding. When that data is coupled with greater use of precision medicine, there will be a big data explosion in health care, especially as genomic and environmental data become more ubiquitous.

Velocity: Velocity in the context of big data refers to two related concepts familiar to anyone in healthcare: the rapidly increasing speed at which new data is being created by technological advances, and the corresponding need for that data to be digested and analyzed in near real-time. For example, as more and more medical devices are designed to monitor patients and collect data, there is great demand to be able to analyze that data and then to transmit it back to clinicians and others. This “internet of things” of healthcare will only lead to increasing velocity of big data in healthcare.

Variety: With increasing volume and velocity comes increasing variety. This third “V” describes just what you’d think: the huge diversity of data types that healthcare organizations see every day. Again, think about electronic health records and those medical devices: Each one might collect a different kind of data, which in turn might be interpreted differently by different physicians—or made available to a specialist but not a primary care provider. The challenge for healthcare systems when it comes to data variety? Standardizing and distributing all of that information so that everyone involved is on the same page. With increasing adoption of population health and big data analytics, we are seeing greater variety of data by combining traditional clinical and administrative data with unstructured notes, socioeconomic data, and even social media data.

Variability: The way care is provided to any given patient depends on all kinds of factors—and the way the care is delivered and more importantly the way the data is captured may vary from time to time or place to place. For example, what a clinician reads in the medical literature, where they trained, or the professional opinion of a colleague down the hall, or how a patient expresses herself during her initial exam all may play a role in what happens next. Such variability means data can only be meaningfully interpreted when care setting and delivery process is taken into context. For example a diagnosis of “CP” may mean chest pain when entered by a cardiologist or primary care physician but may mean “cerebral palsy” when entered by a neurologist or pediatrician. Because true interoperability is still somewhat elusive in health care data, variability remains a constant challenge.

Value: Last but not least, big data must have value. That is, if you’re going to invest in the infrastructure required to collect and interpret data on a system-wide scale, it’s important to ensure that the insights that are generated are based on accurate data and lead to measurable improvements at the end of the day.

As I pointed out to Mark and Margaret, every clinician and healthcare system is different, and so there’s no “cookie cutter” way to provide high-quality patient care. The same goes for how we handle big data: Organizations might use the same tools and technologies for gathering and analyzing the data they have available, but how they then put that data to work is ultimately up to them.

Click here to listen to the complete “Conversations on Health Care” interview.

Vice President and Chief Health Informatics Officer, IBM Watson Health

More Healthcare Data Analytics stories

Combining real world evidence with artificial intelligence may improve visibility of treatment options for oncologists

Written by Watson Health | AI, Blog Post, Care Management...

With an impending shortage of oncologists, an exponentially growing body of literature, and a steady stream of new cancer cases, it’s no surprise that one study found significant cancer health disparities in the US. ...read more


How hospitals can identify areas for improvement by merging clinical and operational data

Written by Fiona McNaughton | Blog Post, Value-Based Care

The pressure to deliver high quality care at an affordable cost is here to stay. Whether a provider is in a fee-for-service or value-based environment, or straddling both, the relationship between cost and quality is now intimately linked. ...read more


AHMC Healthcare: Driving quality improvements systemwide with concrete data

Written by Watson Health | Blog Post, Value-Based Care

IBM Watson Health interviewed Jonathan F. Aquino, Corporate Chief Quality and Vice Compliance Officer at AHMC Healthcare and Interim Chief Executive Officer at AHMC San Gabriel Valley Medical Center to learn more about how AHMC embraces data-driven initiatives to improve the overall quality of care. ...read more