Can tweets predict the next flu epidemic?

Battling flu season with data and social media

By | 3 minute read | November 16, 2019

The fall season brings many familiar favorites. Stunning foliage. Warm apple cider. Hayrides through pumpkin patches. Flu shots?

It’s common nowadays to see notifications from healthcare organizations on the local news alongside email reminders from employers about annual flu shots. If anything, it’s a normal occurrence—perhaps anticipated, alongside ads for new pumpkin spice- flavored consumables.

But even with careful preparation, healthcare professionals often work behind the curve to track the progress of reported flu outbreaks. Numerous factors are at play.

In America, for example, the US Centers for Disease Control and Prevention (CDC) is in charge of monitoring the flu. The model used to share updates with the general public, however, is based on a surveillance system that receives confirmed cases from physicians, clinics and hospitals. Since reported flu activity arrives nearly one to two weeks after the fact, the program has reduced effectiveness as a predictive tool.

Given the shortcomings of this type of predictive model, three Neuroinformatics Department students at Osnabrück University’s Institute of Cognitive Science came up with a hypothesis in 2017.

With billions of daily active users, social media platforms—from Facebook to Instagram and beyond—are go-to places to share day-to-day personal news and feelings. Including the return of seasonal illnesses.

The students, led by their professor Dr. Gordon Pipa, asked: could crowdsourced data and sentiment from Twitter be used to better predict when and where flu epidemics would occur?

500 million tweets

How did the team from Osnabrück University—located in Germany—plan to test their hypothesis? With the help of AI. They set out to create a cognitive expert system that would be fed information from Twitter.

“Conceptually, my students realized that Twitter posts could help pinpoint the location and severity of flu outbreaks,” Dr. Pipa told Industrious, “if they could learn how many tweeters had symptoms, and where they lived.”

How many tweets would need to be analyzed for the research team to create a viable solution?

“Up to 500,000,000 tweets per day with Twitter Insights,” Dr. Pipa said. That may seem a daunting number to declutter. But for AI, the more data, the better.

The team successfully created a working solution in six months using IBM Bluemix and Watson cognitive services.

Breaking down the content of so many tweets is impossible for humans without the help of AI. The integration of IBM Watson Natural Language Classifier services for tweet analysis allowed the team to differentiate between the many nuances used to describe the stages a person is tweeting about the flu.

Take someone tweeting about plans to get a flu shot. That’s starkly different from a tweet about already being ill with flu-like symptoms, and how that affects that individual’s day: taking time off to visit the doctor or staying home to get better and avoid spreading the virus further.

As an added bonus to the solution, the team used CDC surveillance data, 500 million tweets, and an extra layer of data from nearly 4,000 research papers on why people got the flu in the first place along with the IBM Watson Engagement Advisor service to correctly answer people’s questions on precautionary measures like “Do I have the flu? or “Should I get vaccinated?”

A difference of one week

The new system developed by Dr. Pipa and his team demonstrated that better flu prediction models can benefit many groups.

By showing just how enrichment of flu predictions by social media content were possible, the work helped lay the groundwork for others, and for the future.

Research predictability for healthcare professionals could accelerate by 60 percent, almost one week faster than currently. For state, local, and even federal governments, more accurate prediction models could help officials determine when to alert the public of impending epidemics.

“We are essentially one of the first institutes in Europe that started this—cognitive science and especially cognitive computing,” Pipa said. “And what we did here was demonstrate use cases for cognitive computing.”

Scaling predictability and accuracy

Dr. Pipa has a few ideas for the future.

“We may apply it to other infectious diseases,” he said. “The system could be especially welcome in developing nations where the healthcare infrastructure is weak. Twitter data could be more accurate than information from a medical establishment.”

Ultimately, Dr. Pipa believes one of the most valuable results from this project is the system’s ability to keep users informed with the help of natural language processing.

“We never wanted to create an isolated super-machine,” he said. “We wanted a machine on our side to support us.”

A cognitive system that engages medical professionals and better serves curious users? Now that’s something to ponder, perhaps over a hot bowl of chicken soup.