Transforming Customer Experience with AI Services (Part 2)

7 min read

This is the second in a series of posts discussing how to transform customer experience with AI services.

If you haven't already read Part 1 of the series, you may want to start there.

In Part 2, you will learn how to work with multiple AI services by defining custom models, see them in action, and export that data into a nice dashboard for visualization. We're also going to show you how different dashboards can be built around this space. The end result is a dashboard with a drill down of the calls.

Working with multiple AI services

We're going to start by showing you how multiple AI services can be chained and giving you the approach you might want to take when transforming your customer service.

Watson Speech To Text (STT)

The Watson Speech to Text (STT) service converts the human voice into the written word. You can read more about the details of the service in the STT documentation.

My starting point

I compiled 150 calls in a .wav file. These were real customer calls recordings, where sensitive information been beeped out for data protection.

STT base model

The first thing we looked at was the STT service demo to transcribe audio files into plain text. The following configurations were used:

  • Narrowband model (8KHz) voice model
  • Added some keywords to spot
  • Unchecked Detect multiple speakers

With that, I got the raw base model transcribed text. I could quickly see that I needed to train the STT service to understand things like addresses, names, company names, company details, and so on.

Before jumping to custom models, I made some comparisons with the Amazon Transcribe service and Azure Speech to Text service. The end result was that the IBM Speech To Base Model transcription was better in terms of the accuracy level. (Contact me directly if you are looking for the detailed comparison).

Additionally, the STT API doc is very content rich, and it does a good job of explaining how to run CURL commands.

One thing I tried doing is to set the max_alternatives to three to see three alternative transcriptions. This was good, but not perfect. The base STT model does a pretty good job transcribing customer service calls, even with loud background noise and different customer accents, but it's safe to say it wasn't perfect. So, I started looking at the custom models to improve accuracy.

STT custom models

The SST service offers two custom models (as of the publish date of this blog post). The two options are custom language model and custom acoustic model, and I am going to use these two models in this post. I found the custom language model to be very useful and powerful with very little work. The acoustic model, on the other hand, needed much more work and was less effective.

To start with, I explored both models using CURL commands covered in the docs. I wanted something more visual, however, so I used the STT-model-customizer demo. Using the STT-model-customizer demo, I created a language model, tested how it works visually, re-uploaded my .wav audio files, made the necessary wording updates (which then created a corpus with my custom words), and repeated these steps until I had a good custom language domain. This meant my STT service instance was trained and ready to be used once I passed the custom language model ID when making the API calls.

In the context of an insurance company, my custom language model was for addresses and insurance specific keywords. So, once I had a perfectly transcribed text, the next task was to pass the transcribed text to Natural Language Understanding.

Watson Natural Language Understanding (NLU)

The Watson Natural Language Understanding (NLU) service allows you to analyze text and extract metadata from content, such as concepts, entities, keywords, categories, sentiment, emotion, relations, and semantic roles. You can apply custom models developed using Watson Knowledge Studio to identify industry/domain specific entities and relations in unstructured text with Watson NLU. More on NLU can be found in the NLU docs

In my case, I mainly needed keywords, entities, emotion, and categories. To start with, I used the NLU demo and passed my transcribed text with my STT custom language model. After passing the text to the NLU demo, I was able to see the keywords, entities, emotion, and categories:

  • Keywords: From the keywords, I was able to quickly extract what was the call about. For example, one call was looking to update an address, and update address was the first keyword in the listing with "relevance": 0.744219.
  • Entities: I was able to see people names as a person, company name as the company, quantity as quantity, and much more. Entities can be used with keywords hand and hand.
  • Emotion: I was able to quickly understand the caller emotions, where Joy was the highest score { "emotion": { "document": { "emotion": { "sadness": 0.435443, "joy": 0.548496, "fear": 0.07661, "disgust": 0.037598, "anger": 0.067114 } } } }. The emotions were useful, but not enough; I needed to use the IBM Watson Tone Analyzer service to get more detailed data of the tone in order to make a full judgment on the calls. The Tone Analyzer and NLU Emotions can be used together to offer more accuracy and rich tone output. More on Tone Analyzer later. 
  • Categories: This was to categorize calls into the correct categories. In my case, using the transcribed text I passed to NLU, the output base categories were of no use and not relevant to my caller data. I was getting categories like "/technology and computing/internet technology/web search/people search" and  "/business and industrial/business operations/business plans." I needed to get categories like "/Car insurance/Car insurance quote or /Car insurance/Car insurance update," and this was only possible by creating custom categories models.

I reviewed the extracted metadata from the NLU service, and I knew needed to create a custom categories model using Knowledge Studio and the NLU model in order to get the correct categories.

Watson Knowledge Studio

Knowledge Studio is used to teach Watson the language of a specific domain, with custom machine-learning models that identify entities, relationships, and categories unique to an industry in unstructured text. It allows you to build models in a collaborative environment designed for both developers and domain experts without needing to write code. Models created with Knowledge Studio can be used in Discovery and Natural Language Understanding. More on Knowledge Studio can be found in the Knowledge Studio docs. In my case, I needed to create a custom insurance category model using Knowledge Studio.

These are the steps for creating a custom category model.

In my insurance category model, I had a CSV with categories like:

  • /Car insurance - Car insurance quote
  • /Health insurance - Health insurance quote
  • /House insurance - House insurance quote
  • /Life insurance - Life insurance quote
  • /Travel insurance - Travel insurance quote

The model was created and trained using my NLU service inside Knowledge Studio. I got a model ID generated by Knowledge Studio. Then, I ran a CURL command using NLU to pass my model ID.

CURL command

curl -X POST -u "apikey:<NLU-API-KEY>" \ -H "Content-Type: application/json" \ -d @/Users/Twana/Dev/CustomerX/calls/parameters.json \ ""


CURL Command

CURL command output 

This time, the output gave me the insurance category model, which was to be expected. I then went back into my model and created the full list of categories needed.

CURL command output

This gave me a custom model created for the categories. The last service I needed to look at was the Tone Analyzer service, as stated earlier. The NLU emotions output alone was not sufficient, so I need to use the Tone service.

Watson Tone Analyzer

The Watson Tone Analyzer service outputs various tones—such as joy, sadness, anger, and agreeableness—in daily communications. These tones can impact the effectiveness of communication in different contexts. Tone Analyzer leverages cognitive linguistic analysis to identify a variety of tones at both the sentence and document level. This insight can then used to refine and improve communications. It detects three types of tones, including emotion (e.g., anger, disgust, fear, joy, and sadness), social propensities (e.g., openness, conscientiousness, extroversion, agreeableness, and emotional range), and language styles (e.g., analytical, confident and tentative) from the text. More on the Tone service can be found in the Tone Analyzer docs.

In my case, I needed to understand if the callers were satisfied or not and if they were happy or angry with the service provided. To start, I used the Tone Analyzer Demo and passed the transcribed text from STT into the demo. From the demo, I quickly understood the caller tone and knew if the caller is any of the five—anger, disgust, fear, joy, and sadness.

There are many demos and documentation created with Tone Analyzer, so I am not going into details of the Tone service capabilities. The Tone Analyzer Demo describes the service capabilities very well. The question is, how can I take the JSON output from Tone Analyzer and NLU and create a nice dashboard UI showing all customer calls?

Dashboard UI 

There are many ways which you can output the JSON data from the NLU and Tone Analyzer service. Below are two dashboards that are my favorite ways.

Cloud Insurance Co. Dashboard

The sample dashboard created uses the Tone Analyzer to show how a conversation is flowing well and when an admin needs to be alerted when the tone is angry. The source code of this can be found on GitHub.


Customer Care Analytics Dashboard

This sample dashboard uses data from the Tone Analyzer and NLU service, you can use this dashboard to quickly see what can be made possible. The demo showcases call interactions, customer sentiment, customer tones, and drill down to each call. You can upload your data and see the dashboard. 


Cheat sheet

  • Transcribe audio files using Speech To Text (STT).
  • Re-transcribe audio files using STT custom language model.
  • Pass the STT transcribed text to Natural Language Understanding (NLU) and use the base model.
  • Create NLU custom language models using Knowledge Studio.
  • Use the custom category models created by Knowledge Studio in NLU. Categorize the calls.
  • Use NLU to get callers' emotions.
  • Use Tone Analyzer to get callers' tone.
  • Export the NLU emotions JSON and Tone Analyzer tones JSON into a UI dashboard.
  • UI dashboard view for all the calls. A quick view of how the calls been handled and satisfaction rates. Use one of the two dashboards provided.

Learn more about IBM artificial intelligence solutions.

Be the first to hear about news, product updates, and innovation from IBM Cloud