Measuring emotion with IBM Watson Speech to Text and Tone Analysis

6 min read

Measuring emotion with IBM Watson Speech to Text and Tone Analysis

Earlier this year, I was asked to help create an IBM Bluemix workshop for an insurance company client around an idea for their call centers using Watson Speech to Text and Tone Analysis.


Most of us have had to deal with a call center at some point in our life. We call because we need to find out status, activate a service or complain about a charge. Rarely do we call to commend them on their service!

Often that dialogue can become emotional. We wondered if it would be possible, or useful to see if we could measure emotion in a customer support context, in real time.

Real Time Tone

So, we put together a ‘Real Time Tone’ workshop, combining Watson Speech to Text, with Watson Tone Analysis in an app. The app converts speech to text, samples segments in real time, and plots the measured emotion on a graph.

Real Time Tone

Although we’re just charting the breakdown of the emotions in a line graph, there is endless scope for reacting to changes, or extremes of emotion.

For example:

  • Calls could automatically be directed to someone better suited to dealing with an angry person, or sad person.

  • Quality and professionalism of call handlers could be charted over time at an individual level, at a building level, and at a corporate level.

  • Retention of customers could be evaluated against the emotional levels inside of the conversation.

  • Patterns of emotions could be matched against handling of certain policies

  • Customer choices are often made based on emotion. It excites me to think that the future quality of a service could change based on something as simple as caring about the tone of a conversation.

Try out the Tone Analysis

Click here to try it out for yourself.

There are two ways to see the example in action.

First by pressing the ‘record’ button and speaking for a while. It probably helps to have a piece of text in front of you. For best results use a headset, or a quiet and strong path to your computer’s microphone.

Try out the Tone Analysis

You should see your words appear on the green ‘speaker grille’. If you talk long enough, you should see the colored lines of emotion scroll on the chart area at the top of the app.

Or, you can press the play button and listen to a canned speech ( using Watson Text to Speech ). The words will appear on the speaker grille and the chart will scroll.

You can experiment with the controls. Click on the dial positions to look at the three different modes that Watson Tone Analysis offers. Sample the tone as a document, versus sampling of each sentence. Click on the emotion buttons to limit to the emotion(s) you’re interested in.

This app shows the controls visually, but underneath the covers, the Tone Analysis API allows the modes to be programmed.

If you record your own words, try speaking a stream of positive words ( happy, joyful, bliss, elated … etc ) and watch what happens. Try speaking a stream of negative words and watch what happens.

Design Concept

When we first prototyped our Real Time Tone concept, we mostly hacked the Tone Analysis sample app user interface, and then added in a moving chart. It was functional, and proved the point, but it was fiddly to demo because it was so spread out on the screen.

I’ve found that the app is really helpful for illuminating the potential of not only using speech as a stimulus for apps, but also the potential for triggering off emotion. So I reimagined the UI as a compact ‘device’ for recording emotion. I wanted to surface the controls more visually and simply. I wanted it to have visual appeal, to imagine beyond a conventional UI and consider the wealth of front end options for drawing something that hopefully makes sense.

For a long time now, I’ve been captivated by the industrial design of Dieter Rams and his Braun products from the 60s and 70s. So I started pinning pictures of that equipment to Pinterest, for inspiration.

The result is a compact UI, that draws more interest and sort of combines the physical interfaces of an oscilloscope and an old tape player.

Front End Implementation

I implemented the UI with plain, native CSS3 Flex Boxes. I can’t recommend mastering Flex models enough if you work on web apps. It is joyful laying out a page with Flex Boxes – figuring out how to nest the boxes. Here’s the grid structure for the Tone Analyzer

Front End Implementation

Even the screws at each corner of the UI are implemented using Flex boxes with some border radius and css translation for rotating them. Here’s the css for one of the screws:

.screw {
display: flex;
margin-top: 15px;
flex-direction: row;
transform: translate(90px) rotate(20deg);
transform-origin: 0 -250px;

SVG and Canvas

The speaker grille is implemented using a tiny piece of SVG code – drawing horizontal and vertical lines in a loop to fill the rectangle, forming a mesh.

You can see the that code in the index.html file on github.

The dial is implemented using HTML5 Canvas – just a circle shape for the dial, and a rectangle shape for the indicator. There is a listener for mouse clicks around each of the three positions on the dial. I would have liked more time to develop a fluid and friction aware dial – but I’m happy enough with this simpler approach for now. Here’s the canvas code from dial.js on github.

It surprised me that I used both SVG and Canvas in the same app. If you Google SVG or Canvas you’ll find no end of opinion on which technology is better. I’ve used each in different apps before, but not together – mostly because I’d conditioned myself to thinking I should use one or the other. In this case the grille code was simpler ( for me ) to implement with SVG, and the dial simpler with Canvas – so why not just use both this time.

The other elements are HTML5 Dom elements with some styling. I really like this single page app as a showcase of the options we have to build sites with these days. Web development has come a long way in the past 20 years and while still frustrating at times, it is certainly maturing wonderfully.

We adopted a great little scrolling chart library called smoothie charts for scrolling data. It was pretty simple to figure out. I have to admit, that one day I’d like to try writing my own version of this library out of interest.

Cloud Implementation

The main app is written in Node JS. The API documentation for Tone analyzer can be accessed here. The API for Watson Speech to Text is here.

We’re calling these APIs from the code in app.js which you can see on Github.

The original workshop project on Github provides a structured workshop to learn from. We developed a step by step guide with lots of annotated notes, originally designed for the workshop we ran with the insurance company.


Have you ever regretted choosing the wrong word at the wrong time? Are you conscious of the emotion in the language you choose in the different circumstances you find yourself in?

Our choice of words can reveal a little of our uniqueness – not just to others, but also to ourselves. Tone Analysis opens new opportunities for noticing and learning from our engagements.

Businesses could probably learn a lot just from historical conversation data.

The other part of this project that inspired me was thinking differently about the visual component of it. Web and app development has evolved so well, that we can think of lots of visual ways of working.

For me the niggling design challenge was grouping the controls and graph output in a compact way. I wanted to quickly and easily share the concept and capabilities with other people. I really like dipping into the past interface approaches from the physical world. In fact I think it would be interesting making an IoT version of this tone app.

Sign up for Bluemix. It’s free!

Be the first to hear about news, product updates, and innovation from IBM Cloud