Watson Developer Cloud

Speech to Text

Convert human voice into written word

General Availability

Watson Speech to Text converts audio voice into written text. Use Speech to Text to transcribe calls in a contact center to identify what is being discussed, when to escalate calls, and to understand content from multiple speakers. Use speech to text to create voice-controlled applications – even customize the model to improve accuracy for the language and content you care about most such as product names, sensitive subjects, or names of individuals.

Intended Use

The Speech to Text service can be used anywhere voice-interactivity is needed. In addition to transcribing audio in multiple languages, the service provides the ability to detect the presence of specific keywords or key phrases in the input stream. Common uses for the Speech to Text service include:

  • Interactions in mobile experiences
  • Transcribing media files
  • Call center transcriptions
  • Voice control of embedded systems
  • Converting sound to text to make data searchable

You input

  • Streamed audio with Intelligible Speech
  • Recorded audio with Intelligible Speech

Service output

  • Text transcriptions of the audio with recognized words

Try it out

Check out the Speech to Text demo and choose from pre-recorded audio, upload a WAV file, or record on the fly in US English, UK English, Japanese, Spanish, Brazilian Portuguese, Modern Standard Arabic, or Mandarin and watch the service in action. The API returns metadata providing timestamps, confidence, and alternative hypothesis. The demo also includes options to help Watson learn and improve.

How it is used


Standard Service


First thousand minutes per month are FREE. Additional minutes are $0.02 per minute.

Includes the ability to use wideband models for all supported languages. Also includes confidence scores per word, time offsets per word, and alternate hypotheses per phrase.

Telephony Add-on


First thousand minutes per month are FREE. Additional minutes are $0.02 per minute, in addition to the cost of using the Standard Service.

Adds the ability to use narrowband models for all supported languages. Narrowband models are required to process any audio that passed through a telephone line, since telephone lines down-sample audio to 8 kHz.


Let's talk

Watson Premium plans offer a higher level of security and isolation to help customers with sensitive data requirements.

Click here to find out more

Ready to use?


Getting started is easy! Try out the service on Bluemix now.

Use In Bluemix


Ready to get down to the details? Full documentation detailing how to get started using this Service in Bluemix is available for each Watson service.

View full docs


Localized versions of Watson services (Natural Language Classifier, Retrieve and Rank, Speech to Text, Text to Speech) are available in the following places.

Japanese - SoftBank