Combating the Opioid Epidemic with Machine Learning

Share this post:

The opioid epidemic has become one of the worst health crises in US history.  In 2015, more than 90 Americans died every day from opioid overdoses, a number comparable to deaths in car accidents and projected to have risen further in 2016 and 2017. 1,2 The Centers for Disease Control and Prevention (CDC) estimate the total economic burden of prescription opioid abuse to be $78.5 billion a year, including healthcare costs, lost productivity, and criminal justice involvement.3


The problem often begins with legitimate healthcare encounters in which opioid painkillers are first prescribed, such as for surgeries or chronic back pain.  During treatment, some patients become addicted and go on to suffer the well-documented consequences of addiction, while others do not, even if they become long-term users.  To combat the epidemic, it is vital to understand the exact circumstances under which medically sanctioned treatments can devolve into addiction.


This summer, we are taking the first steps in tackling this question in a project within our Science for Social Good program.  Our team includes a graduate student intern who is driving the effort, several machine learning and data science researchers from IBM Research, and experts from IBM Watson Health. Given our combination of industry-leading data on healthcare claims, the domain knowledge and experience to exploit this data, and the statistical and machine learning expertise to deploy the right analytical tools, we believe that we are in a unique position to uncover new insights to address the opioids problem.

The team’s focus this summer is on analyzing the relationship between factors surrounding an initial opioid prescription and a subsequent diagnosis of addiction, the latter representing the subset of addicts who seek treatment.  The goal is to identify causal (and not merely correlated) factors that lead to addiction diagnosis among all the variables associated with the initial prescription: opioid class (natural, semi-synthetic, synthetic), quantity (chiefly days of supply), and related medical procedures and diagnoses.  We expect to have early results on this work ready to share in November.

The use of healthcare claims is essential to our effort.  These claims are generated every time a patient receives a treatment or service, creating, over time, a detailed record of an individual’s healthcare encounters. Thanks to its acquisition strategy and data partnerships, IBM has access to comprehensive collections of de-identified claims data.  Of particular interest to us are prescription drug claims, which provide precise information about the types and quantities of opioids prescribed.  Just as important are other claims around the time of an opioid prescription, which indicate procedures (such as surgeries) and associated diagnoses that may be connected to the prescription. Outcomes of addiction can be observed through indicators such as diagnoses of addiction and substance abuse given when patients seek treatment, or prescriptions of alternative drugs intended to mitigate addiction. Claims data is also accompanied by insurance plan enrollment records that include demographic and other information on the patients themselves.  Taken together, the many variables present promise richer, more detailed insights into the opioids problem compared to previous analyses, such as those based on prescription data alone.

Claims data does present a challenge to establishing causality, since the data is observational and not the result of intervening in a randomized controlled experiment.  Thus for each set of variables that are tested, statistical methods have to be employed to match patients on non-tested but potentially confounding variables. Other challenges include distinguishing “first” prescriptions from later prescriptions or refills given a limited time window, and identifying linked procedures and diagnoses using clinical and temporal criteria.

Many of the solutions to these challenges will come from machine learning and statistical tools and the expertise to deploy them appropriately. For example, machine learning methods can allow us to consider a wide range of confounding variables and employ flexible models for the patient matching problem mentioned above. Analysis of temporal prescription patterns can allow first prescriptions to be identified in a data-driven way. Apart from causal analysis, the team’s exploration of the data has already yielded interesting findings. For example, we have found differing rates at which various demographic groups use opioids versus receive addiction diagnoses, and confirmed that days of supply matters much more for addiction than quantity (e.g. in milligrams of morphine equivalent) prescribed per day. (See figure below)

Looking more broadly, this summer’s work represents the beginning of a longer and potentially far-reaching mission.  One near-term extension is to better identify those who are addicted but less visible, by considering indirect markers of addiction and/or making inferences about undiagnosed individuals.  The hope with our causal analysis is to be able to offer guidelines on opioid prescription, not only for general policies but also adapted to individual patients.  Overall, our vision is to draw upon the richness of IBM’s claims and other healthcare data to deliver insights tailored to the needs of different stakeholders: healthcare providers, payers, pharmaceutical companies, policymakers, and public health officials.  With an epidemic of this magnitude, such joint efforts are clearly needed.

To learn more about what the Healthcare and Life Sciences team is working on, visit our website. And if you’re interested in more projects from the Science for Social Good program, spend some time on our page.



[1] Rudd et al., MMWR Morb Mortal Wkly Rep 2016;65:1445-1452.

[2] New York Times, June 5, 2017.

[3] Florence et al., Med Care 2016 Oct;54(10):901-906.

More Healthcare stories

We’ve moved! The IBM Research blog has a new home

In an effort better integrate the IBM Research blog with the IBM Research web experience, we have migrated to a new landing page:

Continue reading

Pushing the boundaries of human-AI interaction at IUI 2021

At the 2021 virtual edition of the ACM International Conference on Intelligent User Interfaces (IUI), researchers at IBM will present five full papers, two workshop papers, and two demos.

Continue reading

From HPC Consortium’s success to National Strategic Computing Reserve

Founded in March 2020 just as the pandemic’s wave was starting to wash over the world, the Consortium has brought together 43 members with supercomputing resources. Private and public enterprises, academia, government and technology companies, many of whom are typically rivals. “It is simply unprecedented,” said Dario Gil, Senior Vice President and Director of IBM Research, one of the founding organizations. “The outcomes we’ve achieved, the lessons we’ve learned, and the next steps we have to pursue are all the result of the collective efforts of these Consortium’s community.” The next step? Creating the National Strategic Computing Reserve to help the world be better prepared for future global emergencies.

Continue reading