Analyzing tweets to predict flu epidemics

Share this post:

Influenza is a serious health problem. Millions of Americans get the flu each year. It sends hundreds of thousands to the hospital and thousands die from it. If healthcare professionals could more accurately predict when and where flu outbreaks will peak, they could improve the timing of flu vaccination clinics, better communicate the need for vaccination and improve the distribution of antiviral medications. Many lives could potentially be saved.

The U. S. Centers for Disease Control and Prevention (CDC) does track the flu, of course, but its surveillance system is based on reports from physicians, clinics and hospitals. Therefore, it lags flu activity by one to two weeks, reducing its overall effectiveness as a predictive tool.

To accelerate flu epidemic forecasting, three of my master’s students undertook a research project to develop a cognitive expert system that tapped into social media. Utilizing IBM Bluemix and Watson cognitive services, they created a working solution in just six months.

Digging into Twitter

Conceptually, my students realized that Twitter posts could help pinpoint the location and severity of flu outbreaks if they could learn how many tweeters had symptoms and where they lived. Success would depend on the system’s ability to analyze up to 500 million daily tweets in English to dig out and categorize the needed data.

IBM Watson Natural Language Classifier service delivered that capability. Powered by the cloud, the service’s natural language processing (NLP) can understand nuances in the content and context of everyday language. NLP extends the bounds of keyword search by comprehending the content of tweets.

How Natural Language Processing works

As an example, it makes a difference whether someone tweets that he plans to get a flu shot, or that he already has flu-like symptoms and is staying home from work. Watson can recognize the difference by understanding the sentences. The system applies such cognition across millions of tweets while correlating the analysis with data from the CDC.

Combining the CDC’s rock-solid data with “fuzzy” data from Twitter produces a rock-solid result – the ability to predict flu outbreaks before they happen in near-real time. The system is so promising that we entered it in the CDC’s Predict the Influenza Season Challenge.

Answering people’s questions

Also, we paired a body of knowledge from about 4,000 research papers on why people get the flu with the IBM Watson Engagement Advisor service, which conducts a dialogue in natural language with people who contact the system. It can answer critical questions such as “Do I have the flu?”, “What are flu symptoms?” and “Should I get vaccinated?” The more questions and responses the system handles, the smarter it gets.

What is the future of our system? We plan to add additional data sources to increase accuracy. Now that Watson speaks German, we will extend it to Germany. We may apply it to other infectious diseases. The system would be especially welcome in developing nations where the healthcare infrastructure is weak – Twitter data could be more accurate than information from the medical establishment.

A machine that supports us

Key to our system’s value is its ability to keep users in the loop through natural language processing. We never wanted to create an isolated supermachine, we wanted a machine on our side to support us. A cognitive system that engages medical professionals and answers everyday questions can truly improve our health and our quality of life.


For more details on the flu prediction system, visit the IBM case study or view Dr. Pipa’s video below:

More stories

AI insights from Behr help consumers pick their paint palette

Behr Paint Company offers more than 3,000 colors in our paint collection. We find that consumers often get confused when it comes to picking the right color for their project. They’re overwhelmed with choice, causing a kind of analysis paralysis. Often, people don’t take on or complete a painting project because of their struggle to […]

Continue reading

AI helps companies meet new data protection challenges

In an ideal world, rules should be based on principles—on what’s right, not what’s easy. In Europe, a good example of that maxim in action is the General Data Protection Regulation (GDPR), a set of rules adopted in 2016 designed to protect privacy and personal data for citizens living in the European Union (EU) and […]

Continue reading

How AI helps Japan Airlines personalize the travel experience

For airlines, the sheer volume of flights and travelers can sometimes make it difficult to provide a personalized customer experience. When airports are busy and flights are full, passengers sometimes feel that the airline simply sees them as objects to be transported from point A to point B. In response, Japan Airlines decided to set […]

Continue reading