IBM Watson Speech to Text

What is IBM Watson Speech to Text?

IBM Watson® Speech to Text technology enables fast and accurate speech transcription in multiple languages for a variety of use cases, including but not limited to customer self-service, agent assistance and speech analytics. Get started fast with our advanced machine learning models out-of-the-box or customize them for your use case.

IBM Watson Speech to Text is now available as a containerized library for IBM partners to embed AI technology in their commercial applications.

Benefits

More accurate AI

Our best-in-class AI, embedded within Watson Speech to Text, truly understands your customers.

Customizable for your business

Train Watson Speech to Text on your unique domain language and specific audio characteristics.

Protects your data

Enjoy the security of IBM’s world-class data governance practices.

Truly runs anywhere

Built to support global languages and deployable on any cloud — public, private, hybrid, multicloud, or on-premises.

Feature highlights

What sets Watson Speech to Text apart?

Automatic speech recognition

Enable your voice applications using neural technologies for speech recognition powered by IBM Watson.

Model training options

Improve speech recognition accuracy for your use case with language and acoustic training options.

Optimized for customer care

Activate your voice application with speech models tuned for the customer care domain.

Pre-trained speech models

Activate your voice application with speech models tuned for the customer care domain.

Fine-tuning features

Improve speech recognition accuracy for extracting phrases, words, letters, numbers or lists.

Low latency transcription

Use our models optimized for low latency in real-time speech applications.

Audio diagnostics before transcription

Analyze and correct weak audio signals before transcription begins.

Interim transcription before final results

Improve application response times by using speech transcription as it is generated and throughout the finalization process.

Smart formatting

Transcribe dates, times, numbers, currency values, email and website addresses in your final transcripts by converting them into conventional forms.

Speaker diarization

Recognize who said what in a multi-participant voice exchange. Currently optimized for two-way call center conversations but can detect up to 6 different speakers.

Word spotting and filtering

Filter for specific words or inappropriate content by using our keyword spotting and profanity filtering features. (US English only)

Use cases

Customer self-service

Answer common call center queries using a Watson-powered virtual assistant on the phone.

Call analytics

Improve call center performance by mining conversation logs to quickly and accurately identify emerging call patterns, customer complaints, sentiment, non-compliant behavior and more.

Agent assist

Boost agent productivity and success with real time assistance during calls using AI-powered document and intranet search. As the agent is speaking with a customer, Watson listens in on the conversation, transcribes the audio, searches for relevant content within documentation and feeds the answer back to the agent within seconds.

Interactive demo

Experience the difference

Explore the powerful capabilities of advanced AI, neural voices and voice customization in our interactive demo.

Go to the live demo

Partner with IBM

Accelerate your business growth as an Independent Software Vendor (ISV) by innovating with IBM. Partner with us to deliver enhanced commercial solutions embedded with AI to better address clients’ needs.

Explore ways to accelerate your growth with IBM

Find out more

Build AI-based solutions faster with IBM embeddable AI

Case study

Call center increases per-call revenue by 20%

Hear how a large call center transformed its operations with AI. (2:31)

Ways to buy

Get started for free or view a demo.

Lite

Free

500 minutes of free speech recognition a month and 38 pre-trained speech models.

Start for free

Plus

As low as USD 0.01 per minute

Tune your speech models to improve accuracy in recognition as well as transcription. Plus version includes unlimited minutes per month and 100 concurrent transcriptions.

View details

Premium

Provides large and security-sensitive firms with more capacity and data protection. Premium includes unlimited minutes per month and unlimited concurrent transcriptions.

Deploy Anywhere

Deploy behind your firewall or on any cloud with the flexibility of IBM Cloud Pak for Data. The Deploy Anywhere version includes unlimited minutes per month and unlimited concurrent transcriptions, along with noise detection, speech customization and data isolation.

Resources

API reference

Technical API specifications for all of your development needs.

Download SDKs

The Watson SDK repository in GitHub.

Go to GitHub

Data privacy and security

See documentation about our enhanced security features that ensure your data is isolated and encrypted end-to-end, while in transit and at rest.

Learn more

Build custom speech recognition models within minutes

Learn how to create custom speech models using IBM Watson quickly — without knowing how to code.

How to train your own speech “dragon”

Read about Watson Speech to Text requirements, the methodology and some best practices inspired by actual clients.

Replacing my old IVR system with IBM Watson

Guidelines on how to add a new or existing virtual assistant to your brand-new Watson IVR.