One of the biggest obstacles to advancing AI is the technology’s difficulty interacting with people in the natural language they typically use when speaking and writing. Human language is filled with nuance, hidden meaning and context that machines are currently unable to fully comprehend.
IBM Research AI is leading the push to develop new tools that enable AI to process and understand natural language. Our goal: empower enterprises to deploy and scale sophisticated AI systems that leverage natural language processing (NLP) with greater accuracy and efficiency, while requiring less data and human supervision.
One paper describes an approach to improve an NLP system’s ability to reason, through a process known as textual entailment, by complementing training data with information from an external source. The research is an example of how neuro-symbolic AI—which combines machine learning with knowledge & reasoning—can be applied to NLP to advance the machine’s ability to infer information.
Another paper proposes an improved approach to augment data used to classify text, a crucial piece to training NLP systems. The new approach uses a powerful pre-trained neural network model to artificially synthesize new labeled data for supervised learning, providing a solution to users constrained by little amounts of training data.
An additional paper from IBM Research at AAAI-20 addresses the need for NLP systems to learn how words relate to one another and fit into larger groupings of hypernyms (e.g. animal is a hypernym of dog).
Textual entailment in NLP
For an NLP system to master language, it must be able to generalize and reason when presented with new text. NLP systems can do that using a process known as textual entailment, which determines whether a piece of information is closely related to, contradicts or is neutral to another piece of information. A system’s ability to perform textual entailment—also called natural language inference (NLI)—can be an indicator of whether that system can model the complexities of human natural language understanding, for example, by reconciling existing and new information to correctly answer questions.
Most approaches to textual entailment use only the text present in training data. Researchers with IBM Research, MIT-IBM Watson AI Lab, University of Illinois at Urbana-Champaign and Tulane University have developed an approach to textual entailment that augments any existing text-based entailment model with external knowledge. In the paper, Infusing Knowledge into the Textual Entailment Task Using Graph Convolutional Networks, the researchers describe complementing text-based entailment models with information from external knowledge graphs and using Personalized PageRank—the algorithm Google uses to rank websites in search results—to determine and include only the most relevant information in the textual entailment process.
The researchers evaluate their approach on multiple textual entailment datasets and show that the use of external knowledge helps improve prediction accuracy. Beyond the improved accuracy, the research builds a neuro-symbolic AI approach to NLP that combines machine learning with external knowledge that serves as the “symbolic” piece..
Language-Model-Based Data Augmentation (LAMBADA)
Organizations that want to use NLP often don’t have enough labeled data to adequately train neural networks to recognize and respond to natural language queries and dialogue. Imagine teaching someone a new language using only a fraction of the vocabulary and syntax available to native speakers. IBM’s language-model-based data augmentation (LAMBADA) is designed to give AI systems the additional tools needed to master natural language.
At AAAI, researchers from IBM Research, University of Haifa and Technion – Israel Institute of Technology will present the paper, Do Not Have Enough Data? Deep Learning to the Rescue!, detailing the use of a pretrained machine-learning model to synthesize new labeled data for text classification tasks. Rather than augmenting existing data by simply replacing a single word with a synonym, deleting a word or changing the word order, LAMBADA is pretrained on large bodies of text, enabling it to replicate language in coherent, semantically accurate sentences.
LAMBADA generates new sentences, then filters that content using a classifier trained on the original data. In a series of experiments, the researchers found LAMBADA improves classifiers’ performance on a variety of datasets and significantly improves upon other data augmentation techniques. Looking ahead, the researchers will investigate how well LAMBADA can create new text that doesn’t necessarily follow the same rules or otherwise deviates from its training, a method the researchers call “zero-shot” learning.
NLP applications must be able to not only identify a word’s meaning but also to recognize the relationships between different words. One way to improve the efficiency of NLP training is to automate the grouping of hypernyms—words that represent broad categories into which other words fit. “Color,” for example, is a hypernym of “red.” With enough training, an NLP system should be able to make these associations on its own.
In a new paper, Hypernym Detection Using Strict Partial Order Networks, IBM researchers introduce a specialized neural network architecture called Strict Partial Order Networks (SPON), whose algorithms are tuned to spot asymmetric and transitive properties between different words that identify them as hypernyms. For example, SPON could be trained that the surname Wittgenstein belongs to philosopher Ludwig Wittgenstein, and that a philosopher is a person, to draw the transitive conclusion that Wittgenstein is a person. The relationship is asymmetric, in that not all philosophers are Wittgenstein and not all people are philosophers. Although this may be obvious to someone with even the most basic of language skills, this level of painstaking detail is needed for NLP systems to function properly.
The research also includes an augmented variant of SPON that can generalize what it has learned in order to properly categorize new vocabulary outside of its training set. The researchers plan to explore how to extend SPON in two directions. The first is to analyze how the neural network architecture can account for hypernym hierarchies, when different levels of subordinate words fall under a larger hypernym category—such as, a Jack Russell is a terrier, a terrier is a dog and a dog is an animal. The other direction is to work with word groupings beyond the “is a” construction.
AI for Language in 2020
IBM has made significant progress over the past few years developing AI that can process and respond to natural language, and that progress is accelerating as we move the technology out of the lab and into enterprise applications. A key component of our natural language AI strategy is its adaptability to different domains, use cases and languages.
In July — timed to the Association for Computational Linguistics (ACL) conference — I wrote about three different themes IBM Research is exploring to improve NLP for enterprise domains. The first seeks to advance AI where systems can learn from small amounts of data, leverage external knowledge and use techniques that include neuro-symbolic approaches to language that combine neural and symbolic processing. The second focuses on trusting AI where explainability on how a system reaches a decision is provided. The third approach involves scaling AI to allow continuous adaptation and better monitoring and testing of systems to support the deployment of language systems under the rigorous expectations of enterprises.
In the year ahead, IBM Research will continue researching and producing language technologies and solutions that allow even enterprises with few resources—including time, data, money, staffing and expertise—to take advantage of state-of-the-art NLP capabilities that deliver competitive business advantages.
Real-world decision making often involves situations and systems whose uncertain and inter-dependent variables interact in a complex and dynamic way. Additionally, many scenarios are influenced by external events that affect how system variables evolve. To address these complex scenarios for decision making, together with colleagues at the IBM T. J. Watson Research Center, we have developed a new dynamic, probabilistic graphical model called - Event-driven Continuous Time Bayesian Networks.
IBM Research will present more than fifty technical papers at AAAI-20, as well a rich set of demos of our latest work, reflecting our focus on key areas of AI research including AutoAI, mastering language, planning, computational argumentation, the future of work and security.