Watson Developer Cloud

Speech to Text

The Speech to Text service converts the human voice into the written word.

General Availability

Watson Speech to Text can be used anywhere there is a need to bridge the gap between the spoken word and its written form. This easy-to-use service uses machine intelligence to combine information about grammar and language structure with knowledge of the composition of an audio signal to generate an accurate transcription. It uses IBM's speech recognition capabilities to convert speech in multiple languages into text. The transcription of incoming audio is continuously sent back to the client with minimal delay, and it is corrected as more speech is heard. Additionally, the service now includes the ability to detect one or more keywords in the audio stream. The service is accessed via a WebSocket connection or REST API.

Intended Use

The Speech to Text service can be used anywhere voice-interactivity is needed. The service is great for mobile experiences, transcribing media files, call center transcriptions, voice control of embedded systems, or converting sound to text to then make data searchable. Supported languages include US English, UK English, Japanese, Spanish, Brazilian Portuguese, Modern Standard Arabic, and Mandarin. The Speech to Text service now provides the ability to detect the presence of specific keywords or key phrases in the input stream.

You input

  • Streamed audio with Intelligible Speech
  • Recorded audio with Intelligible Speech

Service output

  • Text transcriptions of the audio with recognized words

Try it out

Check out the Speech to Text demo and choose from pre-recorded audio, upload a WAV file, or record on the fly in US English, UK English, Japanese, Spanish, Brazilian Portuguese, Modern Standard Arabic, or Mandarin and watch the service in action. The API returns metadata providing timestamps, confidence, and alternative hypothesis. The demo also includes options to help Watson learn and improve.

How it is used


Standard Service


First thousand minutes per month are FREE. Additional minutes are $0.02 per minute.

Includes the ability to use wideband models for all supported languages. Also includes confidence scores per word, time offsets per word, and alternate hypotheses per phrase.

Telephony Add-on


First thousand minutes per month are FREE. Additional minutes are $0.02 per minute, in addition to the cost of using the Standard Service.

Adds the ability to use narrowband models for all supported languages. Narrowband models are required to process any audio that passed through a telephone line, since telephone lines down-sample audio to 8 kHz.

Premium Plan

For customers with high requirements around information security, in regulated industries, or who handle highly sensitive data, Watson services are available through a Premium plan. These plans offer developers and organizations Watson services in a single tenant isolated model, including compute-level isolation at the VM and container levels. The Premium plan includes data encryption in transit and at rest that is offered in standard plans. For more information or to purchase a premium plan, contact us.

Ready to use?


Getting started is easy! Try out the service on Bluemix now.

Use In Bluemix


Ready to get down to the details? Full documentation detailing how to get started using this Service in Bluemix is available for each Watson service.

View full docs