Leveraging machine learning and AI to improve diversity in clinical trials

By , Deiva Ramachandran, Jeevan Duggempudi, and Andrea Dobrindt | 4 minute read | January 10, 2023

doctor talks to patients

The modern medical system does not serve all its patients equally—not even nearly so. Significant disparities in health outcomes have been recognized and persisted for decades. The causes are complex, and solutions will involve political, social and educational changes, but some factors can be addressed immediately by applying artificial intelligence to ensure diversity in clinical trials.

A lack of diversity in clinical trial patients has contributed to gaps in our understanding of diseases, preventive factors and treatment effectiveness. Diversity factors include gender, age group, race, ethnicity, genetic profile, disability, socioeconomic background and lifestyle conditions. As the Action Plan of the FDA Safety and Innovation Act succinctly states, “Medical products are safer and more effective for everyone when clinical research includes diverse populations.” But certain demographic groups are underrepresented in clinical trials due to financial barriers, lack of awareness, and lack of access to trial sites. Beyond these factors, trust, transparency and consent are ongoing challenges when recruiting trial participants from disadvantaged or minority groups.

There are also ethical, sociological and economic consequences to this disparity. An August 2022 report by the National Academies of Sciences, Engineering, and Medicine projected that hundreds of billions of dollars will be lost over the next 25 years due to reduced life expectancy, shortened disability-free lives, and fewer years working among populations that are underrepresented in clinical trials.

In the US, diversity in trials is a legal imperative. The FDA office of Minority Health and Health Equity provides extensive guidelines and resources for trials and recently released guidance to improve participation from underrepresented populations.

From moral, scientific, and financial perspectives, designing more diverse and inclusive clinical trials is an increasingly prominent goal for the life science industry. A data-driven approach, aided by machine learning and artificial intelligence (AI), can aid these efforts.

The opportunity

Life science companies have been required by FDA regulations to present the effectiveness of new drugs by demographic characteristics such as age group, gender, race and ethnicity. In the coming decades, the FDA will also increasingly focus on genetic and biological influences that affect disease and response to treatment. As summarized in a 2013 FDA report, “Scientific advances in understanding the specific genetic variables underlying disease and response to treatment are increasingly becoming the focus of modern medical product development as we move toward the ultimate goal of tailoring treatments to the individual, or class of individuals, through personalized medicine.”

Beyond demographic and genetic data, there is a trove of other data to analyze, including electronic medical records (EMR) data, claims data, scientific literature and historical clinical trial data.

Using advanced analytics, machine learning and AI on the cloud, organizations now have powerful ways to:

  • Form a large, complicated, diverse set of patient demographics, genetic profiles and other patient data
  • Understand the underrepresented subgroups
  • Build models that encompass diverse populations
  • Close the diversity gap in the clinical trial recruitment process
  • Ensure that data traceability and transparency align with FDA guidance and regulations

Initiating a clinical trial consists of four steps:

  1. Understanding the nature of the disease
  2. Gathering and analyzing the existing patient data
  3. Creating a patient selection model
  4. Recruiting participants

Addressing diversity disparity during steps two and three will help researchers better understand how drugs or biologics work, shorten clinical trial approval time, increase trial acceptability amongst patients and achieve medical product and business goals.

A data-driven framework for diversity

Here are some examples to help us understand the diversity gaps. Hispanic/Latinx patients make up 18.5% of the population but only 1% of typical trial participants; African-American/Black patients make up 13.4% of the population but only 5% of typical trial participants. Between 2011 and 2020, 60% of vaccine trials did not include any patients over 65—even though 16% of the U.S. population is over 65. To fill diversity gaps like these, the key is to include the underrepresented populations in the clinical trial recruitment process.

For the steps leading up to recruitment, we can evaluate the full range of data sources listed above. Depending on the disease or condition, we can evaluate which diversity parameters are applicable and what data sources are relevant. From there, clinical trial design teams can define patient eligibility criteria, or expand trials to additional sites to ensure all populations are properly represented in the trial design and planning phase.

How IBM can help

To effectively enable diversity in clinical trials, IBM has various solutions, including data management, performing AI and advanced analytics on the cloud, and setting up an ML Ops framework. It helps trial designers provision and prepare data, merge various aspects of patient data, identify diversity parameters and eliminate bias in modeling. It does this using an AI-assisted process that optimizes patient selection and recruitment by better defining clinical trial inclusion and exclusion criteria.

Because the process is traceable and equitable, it provides a robust selection process for trial participant recruitment. As life sciences companies adopt such frameworks, they can build trust that clinical trials have diverse populations and thus build trust in their products. Such processes also help healthcare practitioners better understand and anticipate possible impacts products may have on specific populations, rather than responding ad hoc, where it may be too late to treat conditions.


IBM’s solutions and consulting services can help you leverage additional data sources and identify more relevant diversity parameters so that trial inclusion and exclusion criteria can be re-examined and optimized. These solutions can also help you determine whether your patient selection process accurately represents disease prevalence and improve clinical trial recruitment. Using machine learning and AI, these processes can easily be scaled across a range of trials and populations as part of a streamlined, automated workflow.

These solutions can help life sciences companies build trust with communities that have been historically underrepresented in clinical trials and improve health outcomes.