Recently, impressive progress has been made in neural network question answering (QA) systems which can analyze a passage to answer a question. These systems work by matching a representation of the question to the text to find the relevant answer phrase.
But what if the text is potentially all of Wikipedia? And what if the answer isn’t a close paraphrase of the question but rather multiple small snippets of evidence which must be weighed and combined to make a judgement? Current question answering systems still struggle in scenarios like these, and yet humans can do these things quite easily: navigating through knowledge sources like Wikipedia to search for the answer, prioritizing some documents in their search over others; and combining knowledge from different parts of the documents they read to reason out an answer.
New work from our team called IBM Reinforced Reader-Ranker for Open-Domain Question Answering (R^3) makes progress on both these fronts which will be presented this week at the Thirty-Second AAAI Conference on Artificial Intelligence (AAAI 2018) in New Orleans, Louisiana.
We started with the problem of searching a large corpus like Wikipedia for the answer to a question. Most Natural Language datasets like SQuAD provide the model with both the question and the text where the answer can be found. We wanted to move to a more realistic scenario in which the model would get just the question and access to the documents in Wikipedia. Our approach breaks the problem down into three simple steps: retrieve a small number of documents where the answer might be; rank those documents according to how likely they are to contain the answer; and then “read” the answer from the most likely document.
For the example in the figure below, given the question “what is the largest island in the Philippines,” an Information Retrieval model first searches for documents from an open-domain knowledge resource (Wikipedia in this example). A ranker then selects the passage that is most likely to be useful for answering the question. Finally, a reading comprehension model (reader) extracts the answer from the selected passage.
A key innovation of this research is in combining all these pieces together in a single neural network which learns using how to combine ranking and reading using deep reinforcement learning. In this way we overcome the lack of labels about whether each passage is useful or not.
To implement the system our co-author, Shuohang Wang, already had a very successful matching unit (Match LSTM) which he had developed and used successfully in the SQuAD competition and which we adapted to this open domain. We added the ranker to see if we could learn to prioritize the documents for reading. This is a “hard” not “soft” decision and so can’t be made to work easily with normal stochastic gradient descent training. Instead we decided to train it with a reinforcement learning technique (RL) called REINFORCE which uses a numeric reward to adjust the probabilities of the ranking prediction. If the Ranker selects a document from which the answer can be extracted, it gets a positive reward; otherwise it gets a negative reward. The introduction of RL methods to the training was crucial to the success of this approach which achieved state-of-the-art results on multiple open-domain QA datasets.
We’ve also been working on a new model which can combine and weigh multiple facts or pieces of evidence for each possible answer. In our most recent work, “Evidence Aggregation for Answer Re-Ranking in Open-Domain Question Answering,” to be presented at ICLR 2018, we show how our system can learn to extract separate pieces of evidence from a document and combine them to generate the answer. We will continue to bring machine comprehension and reasoning together to advance AI’s QA robustness and performance.