Watson Speech-to-Text is paying attention to what people are saying (even when you are not)
After a conference call, have you ever had someone say, “The explanation on that call was great. Did anyone write it down?”
Many conference calls, call center conversations and webinars are recorded for replay, but transcription can help listeners get more from calls. Phone conversations are an often-underutilized source of insights, mostly because the unstructured nature of the data is difficult to analyze.
Transcribing calls can be challenging because they often include acronyms and technical terms. Sometimes transcripts also can be difficult to decipher without an indication of which speaker is talking.
As the value of phone interactions is being recognized, the demand for apps that can transcribe speech is increasing. You can use the IBM Watson® Speech to Text service to add speech transcription capabilities to your applications. In a new episode of the Building with Watson webinar series, Bhavik Shah, Senior Offering Manager for IBM Watson, talks with Zach Walchuk about some of the newest features of the Speech to Text service, including language model customization and diarization.
Although speech-to-text technology has existed for many years, when you’re writing applications that involve speech recognition, you still encounter two problems. First, the accuracy depends on the quality of the input audio. Second, the service can only transcribe words that it knows. The Speech to Text service uses the technology behind Watson to determine the most likely results for words and phrases. With the Speech to Text Language Model Customization capability, you can train the service to learn from your input.
The process is iterative and you follow these high-level steps:
- Create a Bluemix account and provision the Speech to Text service.
- Run your test audio files through the standard Speech to Text service and store the output.
- Gather text data to create a custom language model.
- Create the custom language model.
- Use the custom language model on your test audio files.
- Compare your results. You should see higher accuracy using your custom language model.
By creating your own custom models, you can make them align more closely with your application’s requirements and accommodate specific accents, topics and words. Speech to Text also supports real-time speaker diarization, which means it can identify and segment speech by speaker identity, so Watson can process a conversation as it happens between two people. This feature can make your transcripts easier to read because the output includes labels for the speakers.
To learn more about the IBM Speech to Text Service, be sure to check out the Building with Watson webcast.
Learn more with the “Building with Watson” series