GitHubContribute in GitHub: Edit online

copyright: years: 2017, 2023 lastupdated: "2023-01-05"


Integrating third-party speech services

IBM® Voice Gateway supports using speech adapters to integrate third-party speech recognition (speech-to-text) and speech synthesis (text-to-speech) services in place of the IBM® Speech to Text and IBM® Text to Speech services. The adapters are separate Docker containers that you deploy together with Voice Gateway and act as proxies that sit between Voice Gateway and the third-party speech service.

Voice Gateway provides the following options for integrating third-party speech services:

  • Voice Gateway Speech to Text Adapter: The adapter currently enables the Google Cloud Speech API for speech recognition. Using the Google Cloud Speech API enables French, German, and Italian as additional languages for self-service agents. Version 1.0.0.5 and later.
  • Custom speech adapters: To use a different speech recognition or speech synthesis service, you can create your own speech adapter. To get started, use the speech adapter samples. Version 1.0.0.5 and later.

Speech to Text Adapter

Deploying the Speech to Text Adapter

The Voice Gateway Speech to Text Adapter is packaged as a separate Docker image that you configure and deploy along with the core SIP Orchestrator and Media Relay images. Before you deploy the Speech to Text Adapter, deploy a basic Voice Gateway instance as described in Getting started with Voice Gateway. Then, learn more about how add the Speech to Text Adapter to your deployment in the following pages: * Deploying the Speech to Text Adapter on Docker * Deploying the Speech to Text Adapter to Kubernetes in IBM Cloud Kubernetes Service

Configuring the Speech to Text Adapter

To set up the Speech to Text Adapter, you can define the following configurations. * Deployment configuration, which defines the Speech to Text Adapter container and is specified as Docker environment variables. For more information, see Speech to Text Adapter environment variables. * JSON configuration, which you can specify to separately configure multiple tenants within a single Voice Gateway environment. For more information, see Setting up a multi-tenant environment. * Dynamic configuration, which enables you to change settings during a call by specifying API actions and state variables in the Watson Assistant dialog node response. For more information, see Programming self-service agents using the Voice Gateway API.

Learn more about topics related to configuring the Speech to Text Adapter: * Configuring speech recognition for Google Cloud Speech API * Configuring SSL and TLS encryption from the Media Relay to the Speech to Text Adapter * Dynamically configuring the Speech to Text service or Speech to Text Adapter * Configuring multiple service providers