Watson APIs

Watson Speech-to-Text is paying attention to what people are saying (even when you are not)

Share this post:

After a conference call, have you ever had someone say, “The explanation on that call was great.  Did anyone write it down?”

Many conference calls, call center conversations and webinars are recorded for replay, but transcription can help listeners get more from calls. Phone conversations are an often-underutilized source of insights, mostly because the unstructured nature of the data is difficult to analyze.

Transcribing calls can be challenging because they often include acronyms and technical terms. Sometimes transcripts also can be difficult to decipher without an indication of which speaker is talking.

As the value of phone interactions is being recognized, the demand for apps that can transcribe speech is increasing. You can use the IBM Watson® Speech to Text service to add speech transcription capabilities to your applications. In a new episode of the Building with Watson webinar series, Bhavik Shah, Senior Offering Manager for IBM Watson, talks with Zach Walchuk about some of the newest features of the Speech to Text service, including language model customization and diarization.

Although speech-to-text technology has existed for many years, when you’re writing applications that involve speech recognition, you still encounter two problems. First, the accuracy depends on the quality of the input audio. Second, the service can only transcribe words that it knows. The Speech to Text service uses the technology behind Watson to determine the most likely results for words and phrases. With the Speech to Text Language Model Customization capability, you can train the service to learn from your input.

The process is iterative and you follow these high-level steps:

  1. Create a Bluemix account and provision the Speech to Text service.
  2. Run your test audio files through the standard Speech to Text service and store the output.
  3. Gather text data to create a custom language model.
  4. Create the custom language model.
  5. Use the custom language model on your test audio files.
  6. Compare your results. You should see higher accuracy using your custom language model.

By creating your own custom models, you can make them align more closely with your application’s requirements and accommodate specific accents, topics and words. Speech to Text also supports real-time speaker diarization, which means it can identify and segment speech by speaker identity, so Watson can process a conversation as it happens between two people. This feature can make your transcripts easier to read because the output includes labels for the speakers.

To learn more about the IBM Speech to Text Service, be sure to check out the Building with Watson webcast.

Learn more with the “Building with Watson” series

 

Add Comment
No Comments

Leave a Reply

Your email address will not be published.Required fields are marked *

More Watson APIs Stories
August 14, 2017

How Watson AI is helping companies stay ahead of hackers and cybersecurity attacks

Cybercrime is projected to cause $6 trillion in damages by 2021. There is no way to manually read and analyze the huge volumes of unstructured data that security analysts need to process. Watson's AI allows systems to continuously learn by constantly analyzing billions of data points to detect patterns and even predict attacks before they occur, helping clients analyze threats up to 50% faster.

Continue reading

August 4, 2017

AI is the future of cybersecurity – How Watson helps detect threats faster and better protect your organization

Not having complete context and a holistic, real-time view of cyber threats puts an organization at a huge risk. Global ransomware attacks will exceed $5 billion in damages this year, up from $325 million in 2015. Learn how AI is being used to transform the cybersecurity industry and help analysts identity threats and resolve them faster than ever before.

Continue reading

July 26, 2017

Meet the 13-year-old prodigy taking IBM and artificial intelligence by storm

ABC recently profiled 13-year-old Canadian tech prodigy Tanmay Bakshi who started using computers at five, launched his first app at nine, and has been working with IBM's cognitive APIs for a few years now. In 2013 he built "tTables," an app to help kids learn multiplication, a huge achievement for a child who loves to code but is largely self-taught.

Continue reading