IBM Watson Speech to Text: Customer Care
IBM Watson™ Speech to Text: Customer Care provides speech recognition capabilities for your IBM Cloud Private solutions. The service leverages machine learning to combine knowledge of grammar, language structure, and the composition of audio and voice signals to accurately transcribe the human voice. It continually updates and refines its transcription as it receives more speech.
Speech to Text: Customer Care is ideal for clients who need to extract high-quality speech transcripts from call center audio. Clients in industries such as financial services, healthcare, insurance, and telecommunication often need to develop cloud-native applications but are regulated or limited in their adoption of public cloud offerings. The service allows such clients to build on-premises applications for customer care, customer voice, agent assistance, and other solutions.
The service offers three functionally equivalent APIs that accommodate different architectural needs:
- A WebSocket interface for establishing persistent, full-duplex, low-latency connections with the service.
- An HTTP interface for synchronous calls to the service.
- An asynchronous HTTP interface for non-blocking calls to the service.
The service also provides a customization interface that you can use to tune speech recognition for your language and acoustic requirements. You can expand the vocabulary of a model with domain-specific terminology or adapt a model for the acoustic characteristics of your audio. SDKs are also available to simplify your use of the service's interfaces in various programming languages.
Speech to Text: Customer Care transcribes speech in U.S. English, Japanese, or Korean from many audio formats that support both compressed and uncompressed data. For each language, the service offers broadband (16 kHz) and narrowband (8 kHz) models for audio that is sampled at different rates. A single request can include as much as 100 MB of audio, which you can transmit all at once or as a continuous stream of data.
In addition to basic transcription, the service can augment its response to
- Identify individual speakers in a multi-participant exchange.
- Spot keywords and phrases so that you can route customer inquiries.
- Provide acoustically similar alternatives for entire transcripts or for individual words.
- Label each word of a transcript with timestamps and confidence levels.
The WebSocket interface can return interim results that continuously refine hypotheses as transcription progresses. And for US English transcription, the service can filter profanity; apply smart formatting of dates, times, numbers, and other artifacts; and insert certain punctuation symbols based on spoken keyword phrases.
For more information about the service's interfaces and capabilities, see the IBM Watson Speech to Text: Customer Care documentation .
To install and configure the service, use the Helm charts that are provided with the product. To get started, see Installing and configuring the service .