GitHubContribute in GitHub: Edit online

copyright: years: 2018, 2023 lastupdated: "2023-01-04"


Integrating third-party speech and text services

IBM® Voice Gateway supports using speech adapters to integrate third-party speech recognition (speech-to-text) and speech synthesis (text-to-speech) services in place of the IBM® Speech to Text and IBM® Text to Speech services. The adapters are separate Docker containers that you deploy together with Voice Gateway and act as proxies that sit between Voice Gateway and the third-party speech service.

Voice Gateway provides the following options for integrating third-party speech services:

  • Voice Gateway Speech to Text Adapter: The Speech to Text adapter currently enables the Google Cloud Speech API for speech recognition. Using the Google Cloud Speech API enables French, German, and Italian as additional languages for self-service agents. Version 1.0.0.5 and later.
  • Voice Gateway Text to Speech Adapter: The Text to Speech adapter currently enables the Google Text to Speech API to synthesize speech as audio from text. By using Google Text to Speech API, you can choose additional voices for a self-service agent. Version 1.0.0.7a and later.
  • Media Resource Control Protocol Version 2 (MRCPv2): You can use Voice Gateway as an MRCPv2 client to connect with speech to text and text to speech services that act like MRCPv2 servers, such as Nuance. Version 1.0.0.7 and later. See Configuring services with MRCPv2.
  • Custom speech adapters: To use a different speech recognition or speech synthesis service, you can create your own speech adapter. To get started, use the speech adapter samples. Version 1.0.0.5 and later.