Watson APIs

Watson Speech-to-Text is paying attention to what people are saying (even when you are not)

Share this post:

After a conference call, have you ever had someone say, “The explanation on that call was great.  Did anyone write it down?”

Many conference calls, call center conversations and webinars are recorded for replay, but transcription can help listeners get more from calls. Phone conversations are an often-underutilized source of insights, mostly because the unstructured nature of the data is difficult to analyze.

Transcribing calls can be challenging because they often include acronyms and technical terms. Sometimes transcripts also can be difficult to decipher without an indication of which speaker is talking.

As the value of phone interactions is being recognized, the demand for apps that can transcribe speech is increasing. You can use the IBM Watson® Speech to Text service to add speech transcription capabilities to your applications. In a new episode of the Building with Watson webinar series, Bhavik Shah, Senior Offering Manager for IBM Watson, talks with Zach Walchuk about some of the newest features of the Speech to Text service, including language model customization and diarization.

Although speech-to-text technology has existed for many years, when you’re writing applications that involve speech recognition, you still encounter two problems. First, the accuracy depends on the quality of the input audio. Second, the service can only transcribe words that it knows. The Speech to Text service uses the technology behind Watson to determine the most likely results for words and phrases. With the Speech to Text Language Model Customization capability, you can train the service to learn from your input.

The process is iterative and you follow these high-level steps:

  1. Create a Bluemix account and provision the Speech to Text service.
  2. Run your test audio files through the standard Speech to Text service and store the output.
  3. Gather text data to create a custom language model.
  4. Create the custom language model.
  5. Use the custom language model on your test audio files.
  6. Compare your results. You should see higher accuracy using your custom language model.

By creating your own custom models, you can make them align more closely with your application’s requirements and accommodate specific accents, topics and words. Speech to Text also supports real-time speaker diarization, which means it can identify and segment speech by speaker identity, so Watson can process a conversation as it happens between two people. This feature can make your transcripts easier to read because the output includes labels for the speakers.

To learn more about the IBM Speech to Text Service, be sure to check out the Building with Watson webcast.

Learn more with the “Building with Watson” series

 

Senior Writer, IBM Watson

More Watson APIs stories
July 13, 2018

Improving the accuracy & speed of translations with Neural Machine Translation

Neural Machine Translation technology comes standard in the IBM Watson Language Translator service. NMT is an advanced machine translation method based on deep learning, the development of which has led to remarkable improvements in translation fluency and the achievement of higher human evaluations.

Continue reading

June 30, 2018

How Watson text-to-speech AI helped an author bring his book’s main character to life

Discover how one author used IBM Watson Text to Speech to help bring his literary character to life. Text to Speech synthesizes text to audio in various languages, voices and dialects.

Continue reading

June 26, 2018

Updates to Watson Visual Recognition – 
Price reduction for Custom Classification, and 
Food and Explicit Models are now GA

Announcing updates to the IBM Watson Visual Recognition: A price reduction for Custom Classification events, and two models becoming generally available. Our blog highlights all these exciting changes.

Continue reading