IBM Research at EMNLP 2020

Share this post:

At the annual Conference on Empirical Methods in Natural Language Processing (EMNLP), IBM Research AI is presenting 30 papers in the main conference and 12 findings that together aim to advance the field of natural language processing (NLP).

Of note, IBM Research AI published earlier this year a new, four-part mastering language taxonomy that we believe will advance enterprise AI both through enhancing basic NLP features, as well as introducing more advanced tools and concepts. To learn more about this strategy and how we’re advancing it, click here.

Technical Highlights at EMNLP 2020

Deep learning methods are data hungry and are rather specialized. A system trained to answer questions on clinical data will not necessarily perform well on research papers in the medical domain. To cope with real-world scenarios, we are developing new domain adaptation methods as well as new ways to learn from a limited amount of labeled data. In Multi-Stage Pre-training for Low-Resource Domain Adaptation (presentation here), we describe three techniques that rely on non-annotated data to improve domain adaptation methods; (i) fine tuning a language model on all available in-domain unlabeled data to capture domain jargon, (ii) automatically identifying domain-specific terms from unlabeled documents to enhance the model vocabulary before fine-tuning, and (ii) automatically generating a task related to the desired target task, by relying on the structure of the available labeled and unlabeled data. Finally, we describe how automatically perturbing the available manually annotated data (e.g., by paraphrasing, simplifying, dropping irrelevant words) can be successfully used to increase the available manually annotated data.

Active learning is a prevalent paradigm to cope with data scarcity. In Active Learning for BERT: An Empirical Study (presentation here), we present the first large-scale empirical study on the combination of active learning strategies and BERT. The code of the research framework we built for this study is released and can be easily extended to facilitate future research.

Another real-world challenge is gaining the trust of the user in the model predictions. For that, Explainable AI aims to learn human-interpretable, or white-box, models as opposed to black-box models whose logic is opaque to human understanding. Neurosymbolic AI aims to utilize neural networks to learn white-box models expressed in first-order logic. In Learning Explainable Linguistic Expressions with Neural Inductive Logic Programming for Sentence Classification (presentation here), we combine the two to learn NLP models that are accurate, human-interpretable and, human-modifiable. The last point enables human-machine model co-creation, enabling the injection of domain expertise into automatically learned machine learning models.

A major theme for IBM research AI is improving the world of customer care. A frequent pattern in customer care conversations is the agent responding with appropriate webpage URLs that address customer’s needs. In Conversational Document Prediction to Assist Customer Care Agents (presentation here), we study the task of predicting the documents that customer care agents can use to facilitate customer’s needs. We show that a hybrid information retrieval and deep learning approach provides the best of both worlds. In addition, we introduce a new public dataset, which contains conversations between customers and customer care agents in 25 organizations on the Twitter platform. In doc2dial: A Goal-Oriented Document-Grounded Dialogue Dataset (presentation here), we create a new goal-oriented document-grounded dialog dataset that captures more diverse scenarios derived from various document contents from multiple domains. For data collection, we propose a novel pipeline approach for dialogue data construction, which has been adapted and evaluated for several domains.

From the Project Debater team, Quantitative Argument Summarization and Beyond: Cross-Domain Key Point Analysis (presentation here) describes Key Point Analysis as an innovative summarization method. We automatically extract the main points discussed in a collection of texts, along with their prevalence in the data. For example, when applied to thousands of responses collected for a municipal survey, it can inform the policy makers that the point “The city needs better public transportation” in the summary matches 8% of the responses, while the point “consider increasing the number of parks, walking and biking trails” matches 4% of the responses. This new technology is being showcased on Bloomberg TV’s “That’s Debatable” show. In the first episode, it summarized over 3,500 pro and con arguments submitted for a debate on wealth distribution.

Aside of Enterprise-oriented research, we also do general machine learning research. For example, in Unsupervised Reference-Free Summary Quality Evaluation via Contrastive Learning (presentation here) we design a new method to evaluate summaries qualities, without human-written reference summaries, by unsupervised contrastive learning. Specifically, we design a new metric which covers both linguistic qualities and semantic informativeness based on BERT. Our new evaluation method largely outperforms other metrics even without reference summaries on real-world datasets. Furthermore, it is shown to be a generalizable metric and it can be transferable across different datasets.

Finally, here are two datasets worth checking. They both provide new and clearer view of common tasks. In Neural Conversational QA: Learning to Reason v.s. Exploiting Patterns (presentation here), we examine recent state-of-the-art models on the ShARC QA task and we find indications that the models learn spurious clues, or patterns, in the dataset. We thus create and share a modified dataset that has fewer spurious patterns, consequently allowing models to learn better. For the task of rewriting a complex sentence into simpler ones, we found that the widely used benchmark dataset universally contains easily exploitable syntactic cues caused by its automatic generation process. To remedy such limitations, in Small but Mighty: New Benchmarks for Split and Rephrase (presentation here), we collect and release two crowdsourced benchmark datasets. We make sure that they contain significantly more diverse syntax, as well as carefully control for their quality according to a well-defined set of criteria.

Accepted papers

  • Small but Mighty: New Benchmarks for Split and Rephrase

Li Zhang, Huaiyu Zhu, Yunyao Li, University of Pennsylvania

  • Neural Conversational QA: Learning to Reason v.s. Exploiting Patterns

Nikhil Verma, Abhishek Sharma, Dhiraj Madan, Danish Contractor, Harshit Kumar, Sachin Joshi

  • Improved Topic Representations of Medical Documents to Assist COVID-19 Literature Exploration

Yulia Otmakhova Ms, Karin Verspoor, Timothy Baldwin, Simonֵ Suster

  • Layout-Aware Text Representations Harm Clustering Documents by Type

Catherine Finegan-Dollak, Ashish Verma

  • Hierarchical Pre-training for Sequence Labelling in SpokenDialog

Emile Chapuis, Pierre Colombo, Matteo Manica, Matthieu Labeau, Chloe Clavel

  • SetConv: A New Approach to Learning from Imbalanced Data

Charu Aggarwal

  • Exploring Semantic Capacity of Terms

Jie Huang, Kevin Chang, Wen-mei Hwu, Jinjun Xiong

  • Predictive Model Selection for Transfer Learning in Sequence Labeling Tasks

Parul Awasthy, Bishwaranjan (Bhatta) Bhattacharjee, Hans Florian, John Kender

  • DualTKB: A Dual Learning Bridge between Text and Knowledge Base (separate blog here)

Pierre Dognin, Igor Melnyk, Inkit Padhi, Cicero N Dos Santos, Payel Das

  • Learning Structured Representations of Entity Names using Active Learning and Weak Supervision

KUN QIAN, Poornima Chozhiyath Raman, Lucian Popa, Yunyao Li

  • Learning Explainable Linguistic Expressions with Neural Inductive Logic Programming for Sentence Classification

PRITHVIRAJ SEN, Sekar Krishnamurthy, Laura Chiticariu, Marina Hailpern, Matthias Boehm, Siddhartha Brahma, Yunyao Li

  • Interactive Fiction Game Playing as Multi-Paragraph ReadingComprehension with Reinforcement Learning

Xiaoxiao Guo, Yupeng Gao, Mo Yu, Shiyu Chang, Chuang Gan, Murray Campbell

  • Graph2Tree: Attention-based Graph Neural Networks for Semantic Parsing

Lingfei Wu

  • LiMiT: The Literal Motion in Text Dataset

Irene Manotas, Vadim Sheinin, An Vo

  • doc2dial: A Goal-Oriented Document-Grounded Dialogue Dataset

Song Feng, Hui Wan, Chulaka GUNASEKARA, Siva Sankalp Patel, Sachindranath Mahajan, Luis Lastras

  • Conversational Document Prediction to Assist Customer Care Agents

Jatin GANHOTRA, HAGGAI Roitman, Nathaniel Mills, Chulaka Gunasekara, Yosi Mass, Sachin Joshi, Luis Lastras, David Konopnicki, Doron Cohen

  • Ad-hoc Document Retrieval using Weak-Supervision with BERT and GPT2

Yosi Mass, Haggai Roitman

  • Unsupervised Reference-Free Summary Quality Evaluation via Contrastive Learning

Tengfei Ma, Lingfei Wu

  • Improving Out-of-Scope Detection in Intent Classification by Using Embeddings of the Word Graph Space of the Classes

Paulo Rodrigo Cavalin, Victor Henrique Alves Ribeiro, Ana Paula Appel, Claudio Santos Pinhanez

  • Quantitative Argument Summarization and Beyond: Cross-Domain Key Point Analysis

Roy Bar-Haim, Yoav Kantor, Lilach Edelstein, Roni Friedman-Melamed, Dan Lahav, Noam Slonim

  • Bootstrapped Q-learning with Context Relevant Observation Pruning to Generalize in Text-based Game

Subhajit Chaudhury, Daiki Kimura, Kartik Talamadupula, Mich Tatsubori, Asim Munawar, Ryuki Tachibana

  • Active Learning for BERT: An Empirical Study

Liat Ein-Dor, Alon Halfon, Ariel Gera, Eyal Shnarch, Lena Dankin, Leshem Choshen, Marina Danilevsky, Yoav Katz, Noam Slonim, Ranit Aharonov

  • Structural Supervision Improves Few-Shot Learning and Syntactic Generalization in Neural Language Models

Ethan Wilcox, Peng Qian, Richard Futrell, Roger Levy, Ryosuke Kohita, Miguel Ballesteros

  • Q-learning with Language Model for Edit-based Unsupervised Summarization

Ryosuke Kohita, Akifumi Wachi, Yang Zhao, Ryuki Tachibana

  • CLAR: A Cross-Lingual Argument Regularizer for Semantic Role Labeling

Ishan Jindal, Yunyao Li, Siddhartha Brahma, Huaiyu Zhu

  • Octa: Omissions and Conflicts in Target-Aspect Sentiment Analysis

Zhe Zhang, CHUNG-WEI HANG, Munindar Singh

  • Multi-Stage Pretraining for Low-Resource Domain Adaptation

Revanth Gangi Reddy, RONG ZHANG, Vittorio Castelli, Efsun Kayi, Arafat Sultan, Anthony Ferritto, Avi Sil, Todd Ward, Hans Florian, Salim Roukos


  • ARES: A Reading Comprehension Ensembling Service

Anthony Ferritto, Lin Pan, Rishav Chakravarti, Salim Roukos, Hans Florian, Bill Murdock, Avi Sil

  • A Technical Question Answering System with Transfer Learning

Lingfei Wu, Yu Deng, Ruchi Mahindru, Sinem Guven Kaya

  • Agent Assist through Conversation Analysis

Kshitij Fadnis, Nathaniel Mills, Jatin Ganhotra, Haggai Roitman, Doron Cohen, Shai Erera, Siva Sankalp Patel, Luis Lastras, David Konopnicki, Gaurav Pandey, Chulaka Gunasekara, Yosi Mass, Vera Liao, Danish Contractor, Sachin Joshi

Findings of EMNLP papers

  • MCMH: Learning Multi-Chain Multi-Hop Rules for Knowledge Graph Reasoning

Lu Zhang, Mo Yu, Tian Gao, Yue Yu

  • Effects of Naturalistic Variation in Goal-Oriented Dialog

Jatin Ganhotra, Robert Moore, Sachin Joshi, Kahini Wadhawan

  • Answer Span Correction in Machine Reading Comprehension

Revanth Gangi Reddy, Arafat Sultan, Efsun Kayi, Rong Zhang, Vittorio Castelli, Avi Sil

  • Transition-based Parsing with Stack-Transformers

Ramon Astudillo, Miguel Ballesteros, Tahira Naseem, Austin Blodgett, Hans Florian

  • Pushing the Limits of AMR Parsing with Self-Learning

Young-Suk Lee, Ramon Astudillo, Tahira Naseem, Revanth Gangi Reddy, Hans Florian, Salim Roukos

  • Visual Objects As Context: Exploiting Visual Objects for Lexical Entailment

Masayasu Muraoka, Tetsuya Nasukawa, Bishwaranjan (Bhatta) Bhattacharjee

  • A Dual-Attention Network for Joint Named Entity Recognition and Sentence Classification of Adverse Drug Events

Susmitha Wunnava, Xiao Qin, Tabassum Kakar, Xiangnan Kong, Elke Rundensteiner

  • From Disjoint Sets to Parallel Data to Train Seq2Seq Models for Sentiment Transfer

Paulo Rodrigo Cavalin, Marisa Affonso Vasconcelos, Marcelo Carpinette Grave, Claudio Santos Pinhanez, Victor Henrique Alves Ribeiro

  • Balancing via Generation for Multi-Class Text Classification Improvement

Naama Tepper, Esther Goldbraich, Naama Zwerdling, George Kour, Ateret Anaby-Tavor, Boaz Carmeli

  • Multilingual Argument Mining: Datasets and Analysis

Orith Toledo-Ronen, Matan Orbach, Yonatan Bilu, Artem Spector, Noam Slonim

  • The workweek is the best time to start a family – A Study of GPT-2 Based Claim Generation

Avishai Gretz, Yonatan Bilu, Edo Cohen, Noam Slonim

  • Unsupervised Expressive Rules Provide Explainability and Assist Human Experts Grasping New Domains

Eyal Shnarch, Leshem Choshen, Guy Moshkowich, Ranit Aharonov, Noam Slonim

Research Staff Member, NLP

More AI stories

Daily chats with AI could help spot early signs of Alzheimer’s

In a new paper published in Frontiers in Digital Health journal, we present the first empirical evidence of tablet-based automatic assessments of patients using speech analysis — successfully detecting mild cognitive impairment (MCI), the transitional stage between normal aging and dementia.

Continue reading

IBM researchers investigate ways to help reduce bias in healthcare AI

Our study "Comparison of methods to reduce bias from clinical prediction models of postpartum depression” examines healthcare data and machine learning models routinely used in both research and application to address bias in healthcare AI.

Continue reading

Pushing the boundaries of human-AI interaction at IUI 2021

At the 2021 virtual edition of the ACM International Conference on Intelligent User Interfaces (IUI), researchers at IBM will present five full papers, two workshop papers, and two demos.

Continue reading