Share this post:
Update: This research was published in Nature Scientific Data¹ on March 25, 2021.
What impact do measures such as shelter-in-place, mask wearing, and social distancing have on the number of COVID-19 cases? How do the COVID-19 quarantine measures that have been implemented by North American countries compare to South American countries?
These are just a few questions about the wide range of non-pharmaceutical interventions (NPIs) that have been applied by governments, globally. Since the onset of the pandemic, these NPIs have been implemented in various degrees, with the intention of reducing the transmission of COVID-19. For some of these NPIs, the economic burden of applying them can be enormous, and societal implications might be far-reaching.
Hence local governments’ and businesses’ wish to make smart and cautious decisions, often focusing only on NPIs that are the most effective. These decisions may benefit from a data-driven approach in which models of disease spread incorporate information about NPIs. In turn, this could allow for intervention plans to help manage the spread of disease while balancing the socio-economic impact. Information on NPIs being implemented are available across a wide variety of unstructured data sources, including official government websites, press releases, social media and news articles. However, modeling efforts often require NPI data to be available in a structured form.
To address this urgent need, several data collection initiatives have emerged in recent months, resulting in several publicly available datasets with varying degrees of coverage, data freshness and sparsity. However, the vast majority of these are manually curated and do not cover a wide range of interventions. In addition, only a subset of world regions is covered, with only limited information on the fine-grain locations and sources. An AI-assisted, semi-automated data collection approach, driven by a rich, extensible taxonomy, can potentially help to bridge the gap and result in a larger, more frequently updated dataset with less manual labor.
We have recently introduced the Worldwide Non-pharmaceutical Interventions Tracker for COVID-19 (WNTRAC) — a comprehensive dataset consisting of more than 6,000 NPIs implemented worldwide since the start of the pandemic. WNTRAC covers NPIs implemented across 261 countries and territories, and classifies NPI events into a taxonomy of 15 NPI categories. WNTRAC is now publicly available for non-commercial use.
Leveraging crowdsourcing and natural language processing
Since the beginning of the pandemic, over 5,000 new Wikipedia pages on COVID-19 have been written by more than 71,000 volunteers. These pages have accumulated more than 440 million page views by June 2020². Though Wikipedia articles are crowdsourced, they now serve as a common source of NPI event data through the process of collective validation and by citations of credible sources, such as government websites, scientific literature and news articles.
WNTRAC: Artificial intelligence assisted tracking of non-pharmaceutical interventions implemented worldwide for COVID-19
Tapping into this crowdsourced and frequently updated information in Wikipedia, we built an AI-assisted, semi-automated system to construct the dataset and keep it current. We employed natural language processing (NLP) techniques, driven by a rich, extensible taxonomy, to analyze and automatically extract NPI events, daily. To aid with accuracy and veracity, the extracted events are validated against credible sources by a small dedicated group of IBM volunteers prior to releasing each new version of the dataset. Additional technical details about the system can be found in the paper, here.
Shown below is an example of NPI event in the WNTRAC dataset.
An example of NPI measures related to COVID-19 reported via Wikipedia article (as of September 2, 2020)
An NPI event is uniquely identified by five attributes: what, value, where, when, and restriction.
- What is the type of NPI that was imposed or lifted? In the example, the type is “school closure”
- Value is the sub-category or attribute that further qualifies the NPI type more specifically. In the example, the associated value is “all schools closed”
- Where refers to the region (either a country, territory, province or state) where the NPI measure was imposed or lifted. In the example, there are three distinct counties in New York — Westchester, Suffolk and Nassau — that are identified. Therefore, three separate events will be extracted.
- When reflects the date in which the NPI was imposed or lifted. In the example, the date will be March 16, corresponding to the implementation of the NPI.
- Restriction is a flag that indicates whether the event corresponds to an imposition or lifting of the NPI. In the example, a restriction type would imposed.
Additionally, each NPI event contains the URL of the original source (government orders, press release, social media) extracted from the Wikipedia and/or provided by the IBM volunteer.
As of September 2, 2020, WNTRAC dataset contains about 6,600 NPI events. Some interesting statistics about the dataset are shown below.
Popularity of NPIs: US vs Worldwide (as of September 2, 2020)
Time to first NPIs imposed and first 50 cases and first death (as of September 2, 2020)
Our aim is to facilitate research using WNTRAC to help inform answers to questions such as:
- What is the relationship between the spread of COVID-19 and the type of NPIs imposed and the timing of when they are lifted?
- What is the optimal set of NPIs to implement: when, where, and for how long to reduce COVID-19 transmission?
- What is the optimal set of NPIs that can help mitigate economic impact and contain the disease?
WNTRAC is one element of the many initiatives IBM Research teams are driving to help in the fight against COVID-19, including co-founding the High Performance Computing Consortium, research into the genetics and evolution of viruses such as the novel coronavirus, and applying novel causal inference technologies aiming at providing some answers to these questions.
An open invitation to scientific community
We hope that the dataset is valuable for policymakers, public health leaders, and researchers in modeling and analysis efforts for helping control the spread of COVID-19. We invite data scientists and researchers to use the dataset to uncover insights for addressing important pressing questions. You can visualize the associations between NPIs and outcomes using the NPI data browser also released with the dataset. If you are interested in contributing to the dataset or have any questions, please reach out to us here or email us at email@example.com.
- Suryanarayanan, P., Tsou, CH., Poddar, A. et al. AI-assisted tracking of worldwide non-pharmaceutical interventions for COVID-19. Sci Data 8, 94 (2021). https://doi.org/10.1038/s41597-021-00878-y
- Wikimedia Foundation. Wikipedia and COVID-19. https://wikimediafoundation.org/covid19/data (2020)
Inventing What’s Next.
Stay up to date with the latest announcements, research, and events from IBM Research through our newsletter.