Geisinger Health System wanted to mine electronic health record data for insights on diagnosing and treating sepsis — an infection that contributes to 250,000 deaths per year in the US alone.
Geisinger partnered with IBM and used the data science tools available in the IBM® Watson® Studio solution to develop machine learning models capable of analyzing thousands of patient records and medical journals.
Identifies factorsassociated with higher risk of sepsis mortality
Supports more personalized careplans for patients, potentially enabling faster recovery
Helps researchers stay informedabout key academic studies into sepsis treatment
Business challenge story
Developing a more effective care strategy
Geisinger uses the latest technologies to develop new diagnostic techniques and clinical care plans. The healthcare provider, based in Danville, Pennsylvania, recently launched a major project to discover smarter ways to tackle sepsis cases.
Mortality rates for sepsis are unusually high. More than 1.6 million people in the US are diagnosed every year, with over 250,000 dying from the infection — almost one every two minutes. However, over 80 percent of deaths could be prevented with rapid diagnosis and treatment. Every hour of delay in a sepsis diagnosis raises the risk of mortality by 7.6 percent.
Dr. Donna Wolk, Division Director, Molecular and Microbial Diagnostics and Development at Geisinger, explains: “For clinicians, making a sepsis diagnosis can be very difficult, as the symptoms overlap with many other common illnesses. If we can identify patients more quickly and more accurately, we can administer the right treatments early and increase the chances of a positive outcome.”
Geisinger recognized that its electronic health record (EHR) data offered a huge amount of information that could help to inform which patients were more susceptible to a fatal outcome from sepsis, and how best to treat the infection. To extract these insights, Geisinger would need a sophisticated, powerful tool to develop a predictive model capable of assessing hundreds of different risk factors for every patient.
At the same time, Geisinger’s researchers need to stay on top of the latest academic studies, which could inform their approach to sepsis treatment. Around two million new medical journal papers emerge every year, making this a very difficult task for busy staff. Dr. Wolk adds: “Our teams need to be on top of all recent and archived studies, as each publication could reveal valuable findings that help us make the next breakthrough. Searching through back issues to locate specific journal articles can be a time-consuming process, so we also looked to find a way to keep researchers abreast of key findings.”
Deploying machine learning to tackle sepsis
To solve these challenges, Geisinger’s researchers partnered with a team of data scientists from IBM to collaborate on a project to build and train machine learning models. The teams worked with the open source tools available on the Watson™ Studio platform.
For the first use case, Geisinger provided de-identified files for 10,599 patients diagnosed with sepsis between 2006 and 2016, either before hospitalization or during their stay. The Geisinger and IBM teams broke the data into 199 separate features for each patient, covering details such as their age, infection type, surgery and treatments, medical history and lifestyle.
Next, the teams set themselves the goal of using the data to predict patient all-cause mortality during the hospitalization period or during the 90 days post-discharge. The data scientists used open-source XGBoost library and used the Python programming language in the Watson Studio solution to develop a scalable machine learning algorithm based on gradient-boosted decision trees to analyze the data.
After splitting the data 60/40 between training and testing, the team fine-tuned the predictive model before using the final version to estimate precision, recall and the Area Under the Receiver Operating Characteristic (ROC) Curve (AUC), illustrating which clinical features could be used to indicate patient mortality.
Of the 10,599 patients in the sample, 25.2 percent died during hospitalization, and a further 13.1 percent in the 90-day post-discharge period. The predictive model provided impressive results, identifying 1,190 True Positives and 2,087 True Negatives: that is, correctly predicting death for patients in the test data who did die, or survival for patients who were successfully treated.
Dr. Wolk comments: “The results of the predictive model were highly encouraging. The tools available in IBM Watson Studio provided the high performance and speed we needed for analysis, and the support of the IBM Data Science Elite team ensured the project ran smoothly.”
For its second project, IBM and Geisinger used IBM Watson Explorer software to create a searchable index of thousands of medical publications, enabling the clinicians and researchers to mine the journal archive to uncover the most relevant content.
As the first step, Geisinger worked with IBM to import a corpus of medical journals downloaded from the US National Library of Medicine’s PubMed search engine before using the text analytics capabilities of the Watson Explorer solution to parse the content, breaking it down to constituent terms such as disease, sepsis and treatment.
The teams then added natural language taxonomies, including the 2017 Medical Subject Headings (MeSH) created by the US National Library of Medicine, and the DrugBank database. Users of the index can run queries based on the words listed in these hierarchies of key medical and pharmacological terms, uncover relations between concepts and specify a time period for journal publications in which the terms were used.
Dr. Wolk adds: “IBM Watson Explorer has been really impressive so far. Not only could we use the solution to ingest a huge amount of data, but being able to fine-tune search terms and run queries through the intuitive interface will make things much easier for our researchers.”
Creating personalized care plans
Working with IBM, Geisinger has successfully built a predictive model for sepsis mortality based on real-life EHR data. Previous predictive models focused on the infection have not always been developed using data collected from an actual clinical setting, leading to challenges related to bias and how the results can be applied when providing patient care.
Geisinger’s new predictive model has helped researchers identify clinical biomarkers that are associated with the higher rates of mortality from sepsis. The project revealed that features such as age, prior cancer diagnosis, decreased blood pressure, the number of hospital transfers, and time spent on vasopressor medicines were all key factors linked to sepsis deaths.
Dr. Wolk continues: “We already suspected many of the features that the study highlighted were associated with a higher risk of mortality. The results provide reassurance in the validity of the machine learning model, and confirm it can successfully pinpoint important factors from among many other variables.
“Building on our work with IBM Watson Studio, we will be able to develop more personalized clinical care plans for at-risk sepsis patients, potentially increasing their chances of recovery. For example, our clinicians will know that they need to pay extra attention to older sepsis patients, or monitor the length of time they have taken vasopressors in greater depth, or limit the number of transfers between hospitals for vulnerable patients.”
Geisinger now intends to incorporate additional data to continue to train the predictive model. For example, the healthcare provider will input data on sepsis hospitalizations from 2017, along with more information on patients’ socio-economic background to build an even more granular view of the features that influence mortality rates.
By building the searchable index of medical journals using the Watson Explorer solution, Geisinger has also accelerated the time taken to locate valuable research studies. Where clinicians previously had to look through many journals to locate articles, the IBM solution helps them pinpoint useful research with a few clicks and uncover new studies they may not have been aware of before.
Dr. Wolk concludes: “Our experience using machine learning and data science has been very positive, and we see huge potential to continue its use in the medical field. Thanks to IBM, we are well on our way to breaking new ground in clinical care for sepsis and achieving more positive outcomes for our patients.”
About Geisinger Health System
Founded over 100 years ago, Geisinger serves communities in northern and central Pennsylvania, providing healthcare to citizens at a number of primary care and trauma centers. The company also offers medical education and pioneers research into new clinical care. The company’s main site, the Geisinger Medical Center, stands in Danville, Pennsylvania, next to the state-of-the-art Geisinger Center for Health Research, opened in 2007.