Watson Assistant improves intent detection accuracy, leads against AI vendors cited in published study

By | 4 minute read | December 10, 2020

Have you ever used a chatbot to get help? If you have, chances are that you’re all too familiar with answers like, “I’m sorry, I don’t understand” or “Can you try rephrasing?” With frustrating responses like these, many businesses still see chatbots as immature tools for customer service. The biggest impediment to automating interactions is correctly understanding the user’s need.

IBM Watson Assistant uses machine learning and deep learning techniques to understand how to answer end-user questions accurately with relatively small data sets. The artificial intelligence at the core of Watson Assistant is designed to correctly identify the countless permutations of intent in real-world interactions. In short, we designed Watson Assistant to be easy to train and to recognize accurately what the user wants.

But we never settle. Our vision is for Watson Assistant to be the heart of any company’s customer service operation. To do that, we continuously improve our AI, aiming to increase precision, decrease the amount of training data, and shorten the time to production. We’re excited to announce that Watson Assistant has a new and improved intent detection algorithm, which is more accurate versus commercial and open-source solutions in a recently published benchmark (see Table). 1

Because of these improvements, the accuracy of Watson Assistant’s latest model is 79%, up from 76.3% in the immediately previous version. This means a Watson Assistant virtual agent can answer customer help requests much more often on its own without human agent involvement (known in the industry as containment), which can save money and increase user satisfaction.

Better AI for understanding customers

Before we dive into our latest performance analysis, let’s talk about the challenge of intent classification.

Consider the various ways customers can express their problem:

  • “I can’t log in”
  • “My password doesn’t work”
  •  “Forgot password”

The AI technology has to understand that the intent behind these sentences (and infinite variations of wording and misspellings) is getting help resetting the password. Even for a basic intent like this, it takes complex natural language processing (NLP) and classification techniques to get it right. Now imagine the complexity when trying to set up a system to help with, say, mortgage applications.


[1] In November 2020, Jio Haptik Technologies, a conversational AI software company, published a technical paper in which they compared the performance of their product against similar offerings from Google, Microsoft, and RASA. The performance of the other commercial solutions aside from IBM Watson Assistant was taken from the Arora et al. (2020) benchmarking study. IBM ran the same performance tests on IBM Watson Assistant as were reported by Arora et al. for purposes of this analysis. IBM’s full results are available in this technical paper: https://arxiv.org/pdf/2012.03929.pdf


There’s always room for improvement — and multiple ways to achieve it.

In the latest version of Watson Assistant, we added AutoML. This is a technique that tries various algorithms and combinations of features and parameters to find the best results for a given data set without human intervention.

But AutoML requires a lot of computing power and time. So we supplemented it with meta-learning techniques that dramatically speed up and improve intent detection. These replace painstaking human-tweaked feature engineering and algorithm selection with an automated, data-driven process. Using meta-learning, our system observes how the different machine learning algorithms perform across various datasets — and learns how to adapt the algorithm to new datasets.

Finally, we fortified our transfer learning capabilities. These allow the system to transfer what it learned in one domain or task (say, understanding a user’s request to apply for a credit card) to a similar domain or task (applying for a mortgage).

The result: higher accuracy

Our work has resulted in improvements in accuracy while requiring even less data to train the models.

In November 2020, Jio Haptik Technologies, a conversational AI software company, published a technical paper in which they compared the performance of their product against similar commercial offerings from Google, Microsoft and RASA, as well as BERT, an open-source project sponsored by Google. While Haptik did not include Watson Assistant in their analysis, we used the same publicly available data sets and experimental setup as Haptik to evaluate our performance, and we appended our results to their analysis:

According to the benchmark results, Watson Assistant is 5.6 percentage points more accurate than Google Dialogflow, and 14.7 percentage points more accurate than Microsoft LUIS.

You can read the full findings in the IBM’s recently published technical paper, which provides additional detail around the improvements and testing methodology.

Better accuracy can mean better business results

With these enhancements, Watson Assistant is able to improve containment rates (how often the AI solves customer help requests without intervention from human agents) and first contact resolution (how often the system resolves the problem with AI or human agents on first try).

In addition, intent recognition can cut the time to value. Setting up, configuring, and tweaking the performance of an AI-powered customer service system has traditionally taken weeks, if not months. But the new features are engineered to reduce the time and data required to bring Watson Assistant to production.