Innovations in natural language processing from IBM to help enterprises better understand the language of their business

By | 4 minute read | December 9, 2020

At IBM, we’ve put a focus on developing and expanding enterprise natural language processing (NLP) capabilities designed to help your businesses unearth insights, answer questions and make more informed decisions – even with a small data set or lack of expertise.

While human language is simple enough for a child to grasp, it is incredibly complex for even the most advanced machines – the most challenging part of teaching AI to understand human intent is that it requires massive amounts of data, lots of time, and expertise.

When you ask a question, what are you really trying to say? What goal are you trying to achieve? What information are you really trying to access? Human language is full of nuance, resulting in many ways to express a particular intent. This can be problematic for most AI – like chatbots – which stumble when confronted with the complexity of syntax and latch onto specific words rather than the broader context.

To help enterprises address this challenge, IBM launched a new and improved natural language understanding (NLU) model in IBM Watson Assistant for intent classification. The new intent detection algorithm is more accurate versus compared commercial solutions in benchmark testing. (1)

Bringing continued NLP advancements from IBM Research to IBM Watson

In addition, we are introducing new NLP advancements within IBM Watson Assistant and Watson Discovery, now available in beta. Pioneered by IBM Research, the new capabilities are designed to improve the automation of AI and provide a higher degree of precision in NLP.

Reading Comprehension is a feature that returns a specific fact or short answer contained within a long passage. Today, Watson Discovery identifies the best “passages” that correspond with queries. Reading Comprehension retrieves a large number of candidate paragraphs from the set of enterprise documents, searches for an answer to the question at hand and returns the corresponding answers. Reading Comprehension applies contextual understanding to understand queries and leverages massive language models to extract specific answers from the document at hand – and then the user receives a confidence score that indicate how confident the system is in each answer.

This capability is ideal for organizations in the financial industry. For example, if you are trying to make a lending decision, you may need to identify precise facts in complex documents that you would normally be reading and reviewing manually. Previously, Watson Discovery would return suggested paragraphs. With Reading Comprehension, a user will be provided with the precise answer (i.e. “What is the term of this current loan?” “2.9%”), saving them the time of having to manually search through large portfolios of documents. This feature is now available in beta to select Watson Discovery users.

FAQ Extraction, currently available in beta, is a novel answer retrieval technique that crawls web pages to detect FAQs and question-answer pairs, using this content to provide concise, up-to-minute answers through Watson Assistant.

FAQ Extraction is designed to work in Watson Assistant’s Search Skill, which looks for answers to end-users’ questions in documentation. This functionality makes it more likely that end-users find the answers they need when interacting with AI-powered virtual agents.

For example, businesses may struggle to keep up with ever-changing public guidance around permitted returns to the workplace or re-opening brick and mortar stores. It would require enormous resources to keep AI-powered customer care solutions up to date without a mechanism like FAQ Extraction. Instead, Watson Assistant can keep up with the latest information simply by knowing the URL of authoritative FAQ content.

Finally, Watson NLP solutions now support 10 additional languages. IBM Watson Discovery now supports Bosnian, Croatian, Danish, Finnish, Hebrew, Hindi, Norwegian (Bokmål), Norwegian (Nynorsk), Serbian, and Swedish, while Watson Natural Language Understanding (NLU) now provides support for Danish, Norwegian Bokmal, Norwegian Nynorsk, Finnish, Czech, Hebrew, Polish, and Slovak (for Keywords).

These advancements build on a pipeline of NLP innovation from IBM Research. Earlier this year, we announced that we were taking some of the core NLP technologies powering IBM Research’s Project Debater – including advanced sentiment analysis (idiom understanding), summarization, topic clustering and key point analysis — and commercializing them within IBM’s NLP products, like Watson Discovery.

These innovations can help businesses to further understand and derive real value from their business data, so they can make more informed decisions, and provide customers and employees with more efficient insights.

Statements regarding IBM’s future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only.


(1) In November 2020, Jio Haptik Technologies, a conversational AI software company, published a technical paper in which they compared the performance of their product against similar offerings from Google, Microsoft, and RASA. The performance of the other commercial solutions aside from IBM Watson Assistant was taken from the Arora et al. (2020) benchmarking study. IBM ran the same performance tests on IBM Watson Assistant as were reported by Arora et al. for purposes of this analysis. IBM’s full results are available in this technical paper.