Combating the Opioid Epidemic with Machine Learning

Share this post:

The opioid epidemic has become one of the worst health crises in US history.  In 2015, more than 90 Americans died every day from opioid overdoses, a number comparable to deaths in car accidents and projected to have risen further in 2016 and 2017. 1,2 The Centers for Disease Control and Prevention (CDC) estimate the total economic burden of prescription opioid abuse to be $78.5 billion a year, including healthcare costs, lost productivity, and criminal justice involvement.3


The problem often begins with legitimate healthcare encounters in which opioid painkillers are first prescribed, such as for surgeries or chronic back pain.  During treatment, some patients become addicted and go on to suffer the well-documented consequences of addiction, while others do not, even if they become long-term users.  To combat the epidemic, it is vital to understand the exact circumstances under which medically sanctioned treatments can devolve into addiction.


This summer, we are taking the first steps in tackling this question in a project within our Science for Social Good program.  Our team includes a graduate student intern who is driving the effort, several machine learning and data science researchers from IBM Research, and experts from IBM Watson Health. Given our combination of industry-leading data on healthcare claims, the domain knowledge and experience to exploit this data, and the statistical and machine learning expertise to deploy the right analytical tools, we believe that we are in a unique position to uncover new insights to address the opioids problem.

The team’s focus this summer is on analyzing the relationship between factors surrounding an initial opioid prescription and a subsequent diagnosis of addiction, the latter representing the subset of addicts who seek treatment.  The goal is to identify causal (and not merely correlated) factors that lead to addiction diagnosis among all the variables associated with the initial prescription: opioid class (natural, semi-synthetic, synthetic), quantity (chiefly days of supply), and related medical procedures and diagnoses.  We expect to have early results on this work ready to share in November.

The use of healthcare claims is essential to our effort.  These claims are generated every time a patient receives a treatment or service, creating, over time, a detailed record of an individual’s healthcare encounters. Thanks to its acquisition strategy and data partnerships, IBM has access to comprehensive collections of de-identified claims data.  Of particular interest to us are prescription drug claims, which provide precise information about the types and quantities of opioids prescribed.  Just as important are other claims around the time of an opioid prescription, which indicate procedures (such as surgeries) and associated diagnoses that may be connected to the prescription. Outcomes of addiction can be observed through indicators such as diagnoses of addiction and substance abuse given when patients seek treatment, or prescriptions of alternative drugs intended to mitigate addiction. Claims data is also accompanied by insurance plan enrollment records that include demographic and other information on the patients themselves.  Taken together, the many variables present promise richer, more detailed insights into the opioids problem compared to previous analyses, such as those based on prescription data alone.

Claims data does present a challenge to establishing causality, since the data is observational and not the result of intervening in a randomized controlled experiment.  Thus for each set of variables that are tested, statistical methods have to be employed to match patients on non-tested but potentially confounding variables. Other challenges include distinguishing “first” prescriptions from later prescriptions or refills given a limited time window, and identifying linked procedures and diagnoses using clinical and temporal criteria.

Many of the solutions to these challenges will come from machine learning and statistical tools and the expertise to deploy them appropriately. For example, machine learning methods can allow us to consider a wide range of confounding variables and employ flexible models for the patient matching problem mentioned above. Analysis of temporal prescription patterns can allow first prescriptions to be identified in a data-driven way. Apart from causal analysis, the team’s exploration of the data has already yielded interesting findings. For example, we have found differing rates at which various demographic groups use opioids versus receive addiction diagnoses, and confirmed that days of supply matters much more for addiction than quantity (e.g. in milligrams of morphine equivalent) prescribed per day. (See figure below)

Looking more broadly, this summer’s work represents the beginning of a longer and potentially far-reaching mission.  One near-term extension is to better identify those who are addicted but less visible, by considering indirect markers of addiction and/or making inferences about undiagnosed individuals.  The hope with our causal analysis is to be able to offer guidelines on opioid prescription, not only for general policies but also adapted to individual patients.  Overall, our vision is to draw upon the richness of IBM’s claims and other healthcare data to deliver insights tailored to the needs of different stakeholders: healthcare providers, payers, pharmaceutical companies, policymakers, and public health officials.  With an epidemic of this magnitude, such joint efforts are clearly needed.

To learn more about what the Healthcare and Life Sciences team is working on, visit our website. And if you’re interested in more projects from the Science for Social Good program, spend some time on our page.



[1] Rudd et al., MMWR Morb Mortal Wkly Rep 2016;65:1445-1452.

[2] New York Times, June 5, 2017.

[3] Florence et al., Med Care 2016 Oct;54(10):901-906.

More Healthcare stories

Dark Matter Matters: AI Makes DNA Dark Matter Useful

What is the minimal description that captures a space? Asking a mathematician’s basic question of a  biological dataset reveals interesting answers about biology itself. This summarizes our underlying approach to subtyping hematological cancer. Disease subtyping is a central tenet of precision medicine, and is the challenging task of identifying and classifying patients with similar presentations […]

Continue reading

Helping to Untangle Cancer Drug Resistance with Data

Why do targeted cancer therapies often fail? We have acquired so much more understanding about cancer in the last fifty years than in the last five thousand years. Approaches to patient treatments have dramatically changed, and statistics show significant improvement in patient response and outcomes to therapy in the last half a century [1]. Yet […]

Continue reading

Novel AI tools to accelerate cancer research

At the 18th European Conference on Computational Biology and the 27th Conference on Intelligent Systems for Molecular Biology, IBM will present significant, novel research that led to the implementation of three machine learning solutions aimed at accelerating and guiding cancer research.

Continue reading