Who’s speaking? : Speaker Diarization with Watson Speech-to-Text API

Distinguishing between two speakers in a conversation is pretty difficult especially when you are hearing them virtually or for the first-time. Same can be the case when multiple voices interact with AI/Cognitive systems, virtual assistants, and home assistants like Alexa or Google Home. To overcome this, Watson’s Speech To Text API has been enhanced to support real-time speaker diarization.

Post building a popular chatbot using Watson services, there are a couple of requests to include SpeakerLabels setting into our code sample.

So, What is Speaker Diarization?

Speaker diarisation (or diarization) is the process of partitioning an input audio stream into homogeneous segments according to the speaker identity. It can enhance the readability of an automatic speech transcription by structuring the audio stream into speaker turns and, when used together with speaker recognition systems, by providing the speaker’s true identity.

Why Speaker Diarization?

Real-time speaker diarization is a need we’ve heard about from many businesses across the world that rely on transcribing volumes of voice conversations collected every day. Imagine you operate a call center and regularly take action as customer and agent conversations happen — issues can come up like providing product-related help, alerting a supervisor about negative feedback, or flagging calls based on customer promotional activities. Prior to today, calls were typically transcribed and analyzed after they ended. Now, Watson’s speaker diarization capability enables access to that data immediately.

To experience speaker diarization via Watson speech-to-text API on IBM Bluemix, head to this demo and click to play sample audio 1 or 2. If you check the input JSON specifically Line 20 below; we are setting “speaker_labels” optional parameter to true. This helps us in distinguishing between speakers in a conversation.

{<br>
 "continuous": true,<br>
 "timestamps": true,<br>
 "content-type": "audio/wav",<br>
 "interim_results": true,<br>
 "keywords": [<br>
  "IBM",<br>
  "admired",<br>
  "AI",<br>
  "transformations",<br>
  "cognitive",<br>
  "Artificial Intelligence",<br>
  "data",<br>
  "predict",<br>
  "learn"<br>
 ],<br>
 "keywords_threshold": 0.01,<br>
 "word_alternatives_threshold": 0.01,<br>
 "smart_formatting": true,<br>
 "speaker_labels": true,<br>
 "action": "start"<br>
}

A part of output JSON after real-time speech-to-text conversion:

{<br>
 ....<br>
     "confidence": 0.927,<br>
     "transcript": "So thank you very much for coming Dave it's good to have you here. "<br>
    }<br>
   ],<br>
   "final": true,<br>
   "speaker": 0<br>
  }

You can see that a speaker label is getting assigned to each speaker in the conversation.

Steps to enable speaker diarization

  • Watson speech-to-text is available as a service on IBM Bluemix, a cloud platform from IBM. Create a new service to leverage your application.

  • If you are taking the Rest API approach, don’t forget to include the optional parameter “speaker_labels: true” in your request JSON.

  • Based on the programming language your application is created, use any of the easy-to-use SDKs available on Watson Developer Cloud ranging from Python, Node, Java, Swift etc.,

Refer chatbot-watson-android code sample to get a gist of how to enable or add speaker diarization to an existing android app. Similarly, you can use other SDKs to achieve speaker diarization.

Note: Speaker labels are not enabled by default. Check ToDos in the code to uncomment.

Use cases

From integrating into chatbots to interacting with home assistants like Alexa, Google Home etc.; from call centers to medical services, the possibilities are endless.

For Bluemix Code samples and Tutorials, please visit our Bluemix github page.

Categories

More from

IBM TechXchange underscores the importance of AI skilling and partner innovation

3 min read - Generative AI and large language models are poised to impact how we all access and use information. But as organizations race to adopt these new technologies for business, it requires a global ecosystem of partners with industry expertise to identify the right enterprise use-cases for AI and the technical skills to implement the technology. During TechXchange, IBM's premier technical learning event in Las Vegas last week, IBM Partner Plus members including our Strategic Partners, resellers, software vendors, distributors and service…

Kubernetes version 1.28 now available in IBM Cloud Kubernetes Service

2 min read - We are excited to announce the availability of Kubernetes version 1.28 for your clusters that are running in IBM Cloud Kubernetes Service. This is our 23rd release of Kubernetes. With our Kubernetes service, you can easily upgrade your clusters without the need for deep Kubernetes knowledge. When you deploy new clusters, the default Kubernetes version remains 1.27 (soon to be 1.28); you can also choose to immediately deploy version 1.28. Learn more about deploying clusters here. Kubernetes version 1.28 In…

“Teams will get smarter and faster”: A conversation with Eli Manning

3 min read - For the last three years, IBM has worked with two-time champion Eli Manning to help spread the word about our partnership with ESPN. The nature of that partnership is pretty technical, involving powerful AI models—built with watsonx—that analyze massive data sets to generate insights that help ESPN Fantasy Football team owners manage their teams. Eli has not only helped us promote awareness of these insights, but also to unpack the technology behind them, making it understandable and accessible to millions.…

Temenos brings innovative payments capabilities to IBM Cloud to help banks transform

3 min read - The payments ecosystem is at an inflection point for transformation, and we believe now is the time for change. As banks look to modernize their payments journeys, Temenos Payments Hub has become the first dedicated payments solution to deliver innovative payments capabilities on the IBM Cloud for Financial Services®—an industry-specific platform designed to accelerate financial institutions' digital transformations with security at the forefront. This is the latest initiative in our long history together helping clients transform. With the Temenos Payments…