Watson APIs

Freedom and flexibility with Speech-to-Text

Share this post:

Key points:
– IBM Watson’s Speech-to-Text service helps you go deeper than out-of-the-box solutions allow by providing the tooling and functionality to train Watson to learn the language of your business
– New language model customization, customization weighting and acoustic model customization features provide the flexibility you need to create effective solutions for your unique domain needs

Get started with Watson Speech-to-Text


Your business is unique. It has its own language and modes of operation that require customizable solutions to help your people leverage their industry expertise and deliver value. When it comes to speech-to-text solutions, an out-of-the-box service isn’t enough. Your business needs the freedom and flexibility to create solutions that account for your unique industry and domain needs. Consider the following examples:

  • A call center that needs to transcribe thousands of recorded conversations between customers and agents to identify and analyze common call patterns and issues
  • A medical service that wants to create a dictation application for doctors to more readily capture and log patient diagnoses and treatment notes
  • A retail company that is looking to extend its sales engagement with customers through an online conversational application

Each of these scenarios requires the expression of words, phrases, terms, product names and domain-specific jargon that an out-of-the-box service might not fully understand. The discrete environments of each scenario also vary greatly, thereby affecting transcription unless environment variations are properly accounted for.

Beyond the everyday vernacular a baseline speech recognition service provides, as well as the assumption that your audio files will be crisply recorded without interference, IBM Watson’s Speech-to-Text service helps provide the tooling and functionality to train Watson to learn your business.What makes Watson your best choice?

Language model customization

The ability to tailor our base language models to suit your specific domain terminology is enormously powerful, because no one knows the language of your business better than you.This is one of your differentiating factors in the market, and therefore having a language model customization interface at your fingertips allows you to train the Watson Speech-to-Text service more precisely to suit your business language and style with great accuracy.

Language model customization weighting

Beta release:10/2/2017

You can get even more precise in customizing your language model with our Watson Speech-to-Text service by weighting specific words that may be spoken frequently(such as product names or specific terminology used in your business),as opposed to words already in the service’s base vocabulary.You can account for this during machine training, and you can do so for each speech recognition request as needed.This setting is optional, but depending on your speech recognition needs, having this level of language tuning could greatly benefit your application accuracy.

Acoustic model customization

Beta release:10/2/2017

As a complement to our language model customization interface, we now also offer a custom model interface that attends to the acoustical side of your business, thereby helping you go even further in tailoring the service to your business.Think of it as fine-tuning ‘the ear’ of our service, by training Watson to adapt to your specific acoustic environment (like the ambient noise in your call center) and speaker styles (like voice pitch, volume and pace). You can even create and train an acoustic model by providing just the audio files, without the need for corresponding transcription.

In addition to these customization capabilities, read more about all of our speech-to-text tooling features below:

  • Over eight hours of audio can be supported per recognition request
    IBM Watson’s Speech-to-Text service supports eight audio file formats at varying compressions.The maximum audio length that can be supported in a single speech recognition request is approximately 8 hours and 40 minutes.
  • Speaker labels (beta)
    This feature provides a transcription that labels each speaker’s contributions to a multi-participant conversation, available for audio in U.S.English, Spanish and Japanese.
  • Keyword spotting (beta)
    Identify spoken phrases from the audio that match specified keyword strings with a user-defined level of confidence. This feature is especially useful when individual words or topics from the input are more important than the full transcription. For example, it can be used with a customer support system to determine how to route or categorize a customer request.
  • Word alternatives (beta), confidence and timestamps
    Report alternative words that are acoustically similar to the words being transcribed, confidence levels for each of the words, and timestamps for the start and end of each word.
  • Maximum alternatives and interim results
    Return alternative and interim transcription results. The former provides different possible hypotheses; the latter represent interim hypotheses as the transcription progresses. In both cases, the service indicates final results in which it has the greatest confidence.
  • Profanity filtering
    Censor profanity from U.S.English transcriptions by default. You can use the filtering to sanitize the service’s output.
  • Smart formatting (beta)
    Convert dates, times, numbers, phone numbers and currency values in final transcripts of U.S.English audio into more readable, conventional forms.

Learn how to customize your language models and get started with Watson Speech-to-Text.

Offering Manager, Watson Speech Services

More Watson APIs stories
December 14, 2018

Introducing Watson Assistant Plus pricing plan and new powerful features

IBM Watson is expanding Watson Assistant’s capabilities to create personalized interactions and effortless experiences for end users. Today, we announced the new Watson Assistant Plus Pricing Plan for businesses. The Plus Plan introduces a new user-based pricing metric (Monthly Conversing Users or MCU), which will help business grow their virtual assistants without needing to worry about the quantity of messages each end user is sending to the system.

Continue reading

November 6, 2018

VMware Global Services chooses IBM Watson to improve customer service

At VMworld 2018 Europe, we are proud to announce that VMWare is integrating Watson across its support portals to transform customer support. Using artificial intelligence capabilities from Watson Assistant and Watson Machine Learning, VMware clients needing technical support can get a better support experience, helping them rapidly go from case submission to technical solution.

Continue reading

October 15, 2018

IBM AI OpenScale: Operate and automate AI with trust

AI OpenScale allows businesses to operate and automate AI at scale – irrespective of how the AI was built and where it runs. Bridging the gap between the teams that operate AI and those that manage business applications, AI OpenScale provides businesses with confidence in AI decisions.

Continue reading