Watson Speech to Text can be used anywhere there is a need to bridge the gap between the spoken word and its written form. This easy-to-use service uses machine intelligence to combine information about grammar and language structure with knowledge of the composition of an audio signal to generate an accurate transcription. It uses IBM's speech recognition capabilities to convert speech in multiple languages into text. The transcription of incoming audio is continuously sent back to the client with minimal delay, and it is corrected as more speech is heard. Additionally, the service now includes the ability to detect one or more keywords in the audio stream. The service is accessed via a WebSocket connection or REST API.
The Speech to Text service can be used anywhere voice-interactivity is needed. The service is great for mobile experiences, transcribing media files, call center transcriptions, voice control of embedded systems, or converting sound to text to then make data searchable. Supported languages include US English, UK English, Japanese, Spanish, Brazilian Portuguese, Modern Standard Arabic, and Mandarin. The Speech to Text service now provides the ability to detect the presence of specific keywords or key phrases in the input stream.
Streamed audio with Intelligible Speech
Recorded audio with Intelligible Speech
Text transcriptions of the audio with recognized words
Check out the Speech to Text demo and choose from pre-recorded audio, upload a WAV file, or record on the fly in US English, UK English, Japanese, Spanish, Brazilian Portuguese, Modern Standard Arabic, or Mandarin and watch the service in action. The API returns metadata providing timestamps, confidence, and alternative hypothesis. The demo also includes options to help Watson learn and improve.
First thousand minutes per month are FREE. Additional minutes are $0.02 per minute.
Includes the ability to use wideband models for all supported languages. Also includes confidence scores per word, time offsets per word, and alternate hypotheses per phrase.
First thousand minutes per month are FREE. Additional minutes are $0.02 per minute, in addition to the cost of using the Standard Service.
Adds the ability to use narrowband models for all supported languages. Narrowband models are required to process any audio that passed through a telephone line, since telephone lines down-sample audio to 8 kHz.
For customers with high requirements around information security, in regulated industries, or who handle highly sensitive data, Watson services are available through a Premium plan. These plans offer developers and organizations Watson services in a single tenant isolated model, including compute-level isolation at the VM and container levels. The Premium plan includes data encryption in transit and at rest that is offered in standard plans. For more information or to purchase a premium plan, contact us.