The responsible, ethical use of real-world data

While artificial intelligence (AI) can drive insights from real-world data, it’s important that the data is ethically collected, curated and connected.

By and Anil Jain, MD, FACP | 4 minute read | December 10, 2020

Black businessman working on laptop at desk

In the race to understand COVID-19, we see discovery occurring at an accelerated pace with the need – not only for evidence from clinical trials – but also for insights from real-world, near real-time care of many patients across many settings.

There are many instances of how real-world data has helped researchers advance scientific discovery in understanding and optimizing health services. But finding patterns and insights across billions of clinically meaningful data points from de-identified patient records is intended to help clinicians better diagnose, optimally treat and manage patients in a manner that may not be compatible with traditional clinical trials.

For example, IBM researchers used real-world data and an AI/machine learning (ML) model to help predict which patients are at risk to have worsening of their COVID-related complications 28 days following a COVID-19 diagnosis.1 Understanding the types of individuals who may progress or may improve may help clinicians prioritize diagnostic and treatment decisions. While AI/ML models on real-world data will never replace every traditional clinical study, these models have the potential to help identify interesting signals (such as a possible new risk factor or a potential off-label therapeutic) that could be further studied in a traditional clinical study, making the scientific discovery process more efficient.

These discoveries are important as an adjunct to traditional research, given the pace at which we need to respond and the inherent cost and time limitations of clinical trials when our knowledge about COVID-19 is still evolving. Given the availability of real-world evidence, the relative ease of developing AI models and the urgency of needing to understand COVID-19, healthcare researchers must be vigilant to not sacrifice responsible, ethical use of data, ensure the appropriateness of their analytic AI/ML models, and provide transparency of methods and findings for peer review.

Protecting privacy is an ethical responsibility to healthcare consumers

People are often willing to consent to having their health data used in research for the public good. In a recent IBM® Watson Health® PULSE Health Poll, 68% of respondents said they would be willing to share health information with researchers anonymously.2 Organizations that are entrusted to be the stewards of real-world data must ensure that patients’ privacy is protected through required de-identification methods and help researchers understand how they are allowed to use these data sets.

The U.S. federal government recognizes that real-world data is playing an increasing role in healthcare decisions. The passage of the 21st Century Cures Act in 2016 signaled its support for using real-world data to inform regulatory decision making, including approvals for medical products and new indications for approved drugs.3

Efforts to make real-world data available for strategic and urgent scientific inquiry requires a commitment to make data available across borders. A global data set enables AI to train using a more geographically and ethnically diverse population. Currently, one of the most significant barriers to accomplishing this is agreement on how to govern data privacy during research.

For example, the Office for Civil Rights (OCR) of the U.S. Department of Health and Human Services (HHS) has rules to protect data privacy in the United States, which differ from the European Union’s General Data Protection Regulations (GDPR).4 Other recent examples – such as Brazil’s Lei Geral de Proteção de Dados Pessoais (LGPD), Serbia’s Personal Data Protection Law, and India’s Personal Data Protection Bill – demonstrate how complex it is to comply with privacy regulations in an evolving global environment. In addition, the lack of standardization of electronic health information has been seen as a challenge to achieving the full value of a globalized, connected real-world data set.5

IBM provided AI-powered technology to help healthcare researchers accelerate their efforts to understand COVID-19. IBM’s AI Ethics Board introduced a review process to help ensure that the use of IBM technology in solutions designed to address COVID-19 is consistent with our values and Principles for Trust and Transparency, which includes a commitment to data privacy.

Connected, globalized real-world data is an essential foundation for widespread adoption of AI in healthcare

Without connected, globalized real-world data, there is a risk that AI can inadvertently propagate bias or identify patterns of limited generalizability. Any study conducted with real-world data should be as clear about the limitations inherent in the data as it is about limitations in the methods or the study results. In the study mentioned above, for example, researchers acknowledge that most of the real-world data came from metropolitan areas, resulting in a higher percentage of African Americans compared to the overall US population. In samples like this, it is important to acknowledge any potential demographic and socioeconomic biases to help support the interpretation of findings.6

Responsible use of real-world data and application of AI/ML methods on the data require researchers to understand both its strengths and its limitations. While these methods efficiently generate insights and potentially significant scientific discoveries, findings must be viewed in context of other evidence from peer-reviewed literature and controlled, randomized clinical trials, and in some cases followed-up with traditional research to validate identified signals.

Healthcare is undergoing a significant digital transformation with increasing use of electronic health records, virtual care platforms and patient-generated data. This real-world data will provide an immense opportunity for researchers seeking insights, especially where traditional clinical trials may not be feasible. We must do this transparently with the right data, the right algorithms and with the right intentions.

  1. Rinderknecht MD, Klopfenstein Y. Predicting critical state after COVID-19 diagnosis: model development using a large US electronic health record dataset. medRxiv. 2020 August 31. doi: PREPRINT.
  2. Results represent responses from 3,002 U.S. survey participants interviewed from December 1-13, 2019. The margin of error is +/- 1.8%. Read full results here:
  5. Value in Healthcare Accelerating the Pace of Health System Transformation, An Insight Report Prepared by the World Economic Forum, in collaboration with Boston Consulting Group (BCG), December 2018.
  6. Rinderknecht MD, Klopfenstein Y. Predicting critical state after COVID-19 diagnosis: model development using a large US electronic health record dataset. medRxiv. 2020 August 31. doi: PREPRINT.