A new AI model, developed by IBM Research and Pfizer, has used short, non-invasive and standardized speech tests to help predict the eventual onset of Alzheimer’s disease within healthy people with an accuracy of 0.7 and an AUC of 0.74 (area under the curve). These predictions were made against data samples from a group of healthy individuals who eventually did or did not develop the disease later in life, allowing researchers to verify the accuracy of the AI model’s prediction. This is a significant increase over predictions based on clinical scales (59%), which is a prediction based on other available biomedical data from a patient, as well as random choice (50%).
The model uses natural language processing to analyze one- to two-minute speech samples from a brief, clinically administered cognitive test. These short samples of language data were provided by the Framingham Heart Study, a long-running study tracking various aspects of health in more than 5,000 people and their families since 1948.
Ultimately, we hope this research can lead to the development of a simple, straightforward and accessible metric to help clinicians assess the risk of Alzheimer’s disease in an individual, leading to earlier intervention.
Alzheimer’s disease and early intervention
Alzheimer’s disease is a devastating disease that begins with vague, often misinterpreted signs of mild memory loss followed by a slow, progressively serious decline in cognitive ability and quality of life.(1,2)
There is currently no effective cure or prevention for this crippling disease, which causes emotional turmoil for both patients and their families. Currently, up to 5.5 million Americans are estimated to be living with Alzheimer’s disease, and recent studies suggest it may be the third-leading cause of death in the U.S., behind heart disease and cancer.(2)
Due to the nature of Alzheimer’s disease and how it takes hold within the brain, it’s likely that the best way to delay its onset and slow its progression is through early intervention.(5) In other words, the earlier clinicians can detect Alzheimer’s disease—even before symptoms begin to appear—the more likely they can potentially one day be able to effectively delay and treat it. Unfortunately for many, by the time Alzheimer’s disease is diagnosed, it’s often too late to prevent the disease from accelerating and taking a full hold.(5)
New AI models for accurate prediction
In partnership with our colleagues from Pfizer, we saw the potential to develop AI models which—if continued to be trained on expanded, robust and diverse datasets—could one day be used to develop methods to more accurately predict Alzheimer’s disease within a large population, including individuals with no current indicators of the disease, no family history of the disease or signs of cognitive decline. Now published in The Lancet eClinicalMedicine, the results are promising.
Using short language samples from individuals participating in the well-known Framingham Heart Study, we were able to train AI algorithms to correctly predict the eventual onset of Alzheimer’s disease in healthy participants in the study with an AUC of 0.74 (area under the curve). The Framingham Heart Study is a community-based, multi-generational and longitudinal cohort study initiated in 1948 to study various aspects of participants’ health. Its datasets are some of the most often used and cited in health research, with nearly 4,000 research publications derived from its data as of the end of 2019.(3)
In IBM Research’s Home Health Lab, research teams work with healthcare partners and clinicians to design AI models that can analyze biomarkers that individuals emit in their natural environments, such as those in speech, language and movement.
Unique aspects of our study
Our study differs significantly from current research in Alzheimer’s disease—and the application of AI to aid in predicting the disease—in quite a few ways.
First, the dataset we worked with includes samples that were collected while the subjects were cognitively healthy, before they experienced the first signs of cognitive impairment. In contrast, most studies predicting future onset have focused on subjects already showing signs of cognitive impairment. Our work also focuses on assessing the risk of Alzheimer’s disease in the general population, instead of solely focusing on high-risk groups or those with a genetic history or predisposition to the disease. As Alzheimer’s disease can affect a wide spectrum of individuals—including those with no family history of the disease or other risk factors—we felt this broader study was critical.
Additionally, we analyzed the transcriptions of participants’ language samples with natural language processing, which allowed us to tap into AI to pick up subtleties and changes in discourse that we may have otherwise missed. This allowed us to train our machine learning models and to account for multiple confounding variables that might affect the outcome of our predictions.
Finally, we had access to data from original participants of the Framingham Study, as well as their offspring and spouses, making for a much larger dataset than those used in most other studies. This unique dataset also allowed us to verify our model’s predictions with real-life results. For example, if our models analyzed a speech sample taken from one of the original participants at the age of 65 and predicted that he or she would develop Alzheimer’s disease by the age of 85, we could then check that person’s records to find out whether he or she had actually been diagnosed with the disease and when the diagnosis occurred. This breadth of data is often very difficult to come by in terms of disease prediction, and access to it allowed us to train these models with precision.
As we continue our research in this field, our hope is that newly accessible datasets become available that expand on the geographical, socioeconomical and racial diversity of data on which we can continue to train our algorithms while always respecting core principles of privacy, transparency and consent.
The future of AI for Alzheimer’s
Ultimately, we hope this research will take root and aid in the future development of a simple, straightforward and easily accessible tool to help clinicians assess a patient’s risk of Alzheimer’s disease through the analysis of speech and language, and in conjunction with a number of other facets of an individual’s health and biometrics.
Having such a tool at their disposal could help doctors determine the need for more complex and demanding psychiatric assessments, testing and monitoring. Typically only given once the development of Alzheimer’sdisease is suspected, current tests may not always be in easy reach of a large population. Being able to identify higher risk patients could also open up the door to more successful clinical trials, as those deemed at a high likelihood of developing the disease could enter trials for preventative therapies.
Our vision is that one day clinicians will have multiple AI and machine learning tools to help identify if an individual is at risk of developing Alzheimer’s disease.
In 2019 for example, we also published work in Scientific Reports that takes a unique look into how machine learning can be used to identify a risk of Alzheimer’s with a simple blood test.(4) By integrating and analyzing a patient’s plasma protein levels, age, and APOE 4 carrier status, our researchers were able to predict early risk indicators for Alzheimer’s disease. One day doctors might be able to use speech and blood tests in conjunction with each other, leveraging AI to help predict the risk of Alzheimer’s disease and laying the groundwork for preventative measures.
In our paper “Extraction of organic chemistry grammar from unsupervised learning of chemical reactions,” published in the peer-reviewed journal Science Advances, we extract the "grammar" of organic chemistry's "language" from a large number of organic chemistry reactions. For that, we used RXNMapper, a cutting-edge, open-source atom-mapping tool we developed.
Founded in March 2020 just as the pandemic’s wave was starting to wash over the world, the Consortium has brought together 43 members with supercomputing resources. Private and public enterprises, academia, government and technology companies, many of whom are typically rivals. “It is simply unprecedented,” said Dario Gil, Senior Vice President and Director of IBM Research, one of the founding organizations. “The outcomes we’ve achieved, the lessons we’ve learned, and the next steps we have to pursue are all the result of the collective efforts of these Consortium’s community.”
The next step? Creating the National Strategic Computing Reserve to help the world be better prepared for future global emergencies.
In our recent paper “AutoAI-TS: AutoAI for Time Series Forecasting,” which we’ll present at ACM SIGMOD 2021, AutoAI Time Series for Watson Studio incorporates the best-performing models from all possible classes — as often there is no single technique that performs best across all datasets.