Healthcare

AI Data Tracker Encourages Scientific Research into COVID-19 Non-Pharmaceutical Interventions

Share this post:

What impact do measures such as shelter-in-place, mask wearing, and social distancing have on the number of COVID-19 cases? How do the COVID-19 quarantine measures that have been implemented by North American countries compare to South American countries?

These are just a few questions about the wide range of non-pharmaceutical interventions (NPIs) that have been applied by governments, globally. Since the onset of the pandemic, these NPIs have been implemented in various degrees, with the intention of reducing the transmission of COVID-19. For some of these NPIs, the economic burden of applying them can be enormous, and societal implications might be far-reaching.

Hence local governments’ and businesses’ wish to make smart and cautious decisions, often focusing only on NPIs that are the most effective. These decisions may benefit from a data-driven approach in which models of disease spread incorporate information about NPIs. In turn, this could allow for intervention plans to help manage the spread of disease while balancing the socio-economic impact. Information on NPIs being implemented are available across a wide variety of unstructured data sources, including official government websites, press releases, social media and news articles. However, modeling efforts often require NPI data to be available in a structured form.

To address this urgent need, several data collection initiatives have emerged in recent months, resulting in several publicly available datasets with varying degrees of coverage, data freshness and sparsity. However, the vast majority of these are manually curated and do not cover a wide range of interventions. In addition, only a subset of world regions is covered, with only limited information on the fine-grain locations and sources. An AI-assisted, semi-automated data collection approach, driven by a rich, extensible taxonomy, can potentially help to bridge the gap and result in a larger, more frequently updated dataset with less manual labor.

We have recently introduced the Worldwide Non-pharmaceutical Interventions Tracker for COVID-19 (WNTRAC) — a comprehensive dataset consisting of more than 6,000 NPIs implemented worldwide since the start of the pandemic. WNTRAC covers NPIs implemented across 261 countries and territories, and classifies NPI events into a taxonomy of 15 NPI categories. WNTRAC is now publicly available for non-commercial use.

Leveraging crowdsourcing and natural language processing

Since the beginning of the pandemic, over 5,000 new Wikipedia pages on COVID-19 have been written by more than 71,000 volunteers. These pages have accumulated more than 440 million page views by June 20201. Though Wikipedia articles are crowdsourced, they now serve as a common source of NPI event data through the process of collective validation and by citations of credible sources, such as government websites, scientific literature and news articles.

 

WNTRAC: Artificial intelligence assisted tracking of non-pharmaceutical interventions implemented worldwide for COVID-19

WNTRAC: Artificial intelligence assisted tracking of non-pharmaceutical interventions implemented worldwide for COVID-19

 

Tapping into this crowdsourced and frequently updated information in Wikipedia, we built an AI-assisted, semi-automated system to construct the dataset and keep it current. We employed natural language processing (NLP) techniques, driven by a rich, extensible taxonomy, to analyze and automatically extract NPI events, daily. To aid with accuracy and veracity, the extracted events are validated against credible sources by a small dedicated group of IBM volunteers prior to releasing each new version of the dataset. Additional technical details about the system can be found in the paper, here.

Shown below is an example of NPI event in the WNTRAC dataset.

An example of NPI measures related to COVID-19 reported via Wikipedia article (as of September 2, 2020)

An example of NPI measures related to COVID-19 reported via Wikipedia article (as of September 2, 2020)

 

An NPI event is uniquely identified by five attributes: what, value, where, when, and restriction.

  1. What is the type of NPI that was imposed or lifted? In the example, the type is “school closure”
  2. Value is the sub-category or attribute that further qualifies the NPI type more specifically. In the example, the associated value is “all schools closed”
  3. Where refers to the region (either a country, territory, province or state) where the NPI measure was imposed or lifted. In the example, there are three distinct counties in New York — Westchester, Suffolk and Nassau — that are identified. Therefore, three separate events will be extracted.
  4. When reflects the date in which the NPI was imposed or lifted. In the example, the date will be March 16, corresponding to the implementation of the NPI.
  5. Restriction is a flag that indicates whether the event corresponds to an imposition or lifting of the NPI. In the example, a restriction type would imposed.

Additionally, each NPI event contains the URL of the original source (government orders, press release, social media) extracted from the Wikipedia and/or provided by the IBM volunteer.

As of September 2, 2020, WNTRAC dataset contains about 6,600 NPI events. Some interesting statistics about the dataset are shown below.

Popularity of NPIs: US vs Worldwide (as of September 2, 2020)

Popularity of NPIs: US vs Worldwide (as of September 2, 2020)

 

Time to first NPIs imposed and first 50 cases and first death (as of September 2, 2020)

Time to first NPIs imposed and first 50 cases and first death (as of September 2, 2020)

 

Looking forward

Our aim is to facilitate research using WNTRAC to help inform answers to questions such as:

  • What is the relationship between the spread of COVID-19 and the type of NPIs imposed and the timing of when they are lifted?
  • What is the optimal set of NPIs to implement: when, where, and for how long to reduce COVID-19 transmission?
  • What is the optimal set of NPIs that can help mitigate economic impact and contain the disease?

WNTRAC is one element of the many initiatives IBM Research teams are driving to help in the fight against COVID-19, including co-founding the High Performance Computing Consortium, research into the genetics and evolution of viruses such as the novel coronavirus, and applying novel causal inference technologies aiming at providing some answers to these questions.

An open invitation to scientific community

We hope that the dataset is valuable for policymakers, public health leaders, and researchers in modeling and analysis efforts for helping control the spread of COVID-19. We invite data scientists and researchers to use the dataset to uncover insights for addressing important pressing questions. You can visualize the associations between NPIs and outcomes using the NPI data browser also released with the dataset. If you are interested in contributing to the dataset or have any questions, please reach out to us here or email us at  ww.cc19@ke.ibm.com.

 

[1] Wikimedia Foundation. Wikipedia and COVID-19. https://wikimediafoundation.org/covid19/data (2020)

 

Inventing What’s Next.

Stay up to date with the latest announcements, research, and events from IBM Research through our newsletter.

 

More Healthcare stories

Advancing the Potential of AI in Medical Imaging at MICCAI 2020

I believe one of the most promising areas for AI to make an impact is in the field of medical imaging. Through advancements in AI that allow for more intelligent and accurate analysis of video and still images, there is hope that clinicians will soon be able to widely augment the data and information they […]

Continue reading

State-of-the-Art Results in Conversational Telephony Speech Recognition with a Single-Headed Attention-Based Sequence-to-Sequence Model

Powerful neural networks have enabled the use of “end-to-end” speech recognition models that directly map a sequence of acoustic features to a sequence of words. It is generally believed that direct sequence-to-sequence speech recognition models are competitive with traditional hybrid models only when a large amount of training data is used. However, in our recent […]

Continue reading

IBM Research at INTERSPEECH 2020

The 21st INTERSPEECH Conference will take place as a fully virtual conference from October 25 to October 29. INTERSPEECH is the world’s largest conference devoted to speech processing and applications, and is the premiere conference of the International Speech Communication Association. The current focus of speech technology research at IBM Research AI is around Spoken […]

Continue reading