July 29, 2019 | Written by: Mo Yu and Saloni Potdar
Share this post:
In Natural Language Processing (NLP), relation extraction (RE) in an important task that aims to find semantic relationships between pairs of mentions of entities. RE is essential for many downstream tasks such as knowledge base completion and question answering.
Figure 1: Model Architecture. Different pairs of entities, e.g., (Iraqi and artillery), (southern suburbs, Baghdad) are predicted simultaneously.
In many enterprise applications, an input paragraph of a RE system usually contains multiple pairs of entities. For example, the paragraph in Figure 1 contains a PART-WHOLE relation between south suburbs and Baghdad, and an ART relation between Iraqi and artillery. However, nearly all the existing RE approaches treat pairs of entity mentions as independent instances. When deep learning models are used for RE, these methods require the same paragraph be encoded multiple times for multiple pairs of entities, which is computationally expensive, especially when the input paragraph is large and the deep model is huge.
Recently, IBM Research AI and IBM Watson have worked together to develop apromising approach that provides both high efficiency (encoding the input in one-pass) and effectiveness (achieving state-of-the-art performance). This method, published at the 57th Annual Meeting of the Association for Computational Linguistics (ACL 2019), achieves a new state-of-the-art result on the Automatic Content Extraction (ACE) 2005 benchmark, and shows that the proposed highly efficient one-time encoding approach could achieve results comparable to more time-consuming multi-pass counterparts.
Our proposed solution builds on the existing Transformer-based, pre-trained, general-purpose language encoder known as Bidirectional Encoder Representations from Transformers (BERT). We make two novel modifications to the Transformer architecture to enable the encoding of multiple relations in one-pass. First, borrowing the idea from recent advances in dependency parsing, we introduce a structured prediction layer to BERT for predicting multiple relations for different entity pairs, as shown on the top of Figure 1. Second, we make the self-attention layers of Transformers aware of the positions of all entities in the input paragraph. The key idea is to use the relative distance between words and entities to encode the positional information for each entity. This information is propagated through different layers via attention computations to achieve embedding vectors that are aware of all the entities in the paragraph on the top layers.
This proposed approach is the first-of-its-kind solution that can simultaneously extract multiple relations with one-pass encoding of an input paragraph. Besides achieving state-of-the-art performance on relation extraction, this idea also points to a more accurate and efficient way to achieve entity-centric passage encoding. In the future, we will explore the usage of this method in question answering applications.
For more details, check out our ACL 2019 paper, “Extracting Multiple-Relations in One-Pass with Pre-Trained Transformers,” authored by Haoyu Wang, Ming Tan, Mo Yu, Shiyu Chang, Dakuo Wang, Kun Xu, Xiaoxiao Guo, Saloni Potdar.