About IBM Voice Gateway

IBM® Voice Gateway enables direct voice interactions over a telephone with a cognitive self-service agent or the ability to transcribe a phone call between a caller and agent so that the conversation can be processed with analytics for real-time agent feedback. Voice Gateway orchestrates Watson services and integrates them with a public or private telephone network by using the Session Initiation Protocol (SIP).

Ways to use IBM Voice Gateway

With IBM Voice Gateway, you can set up both self-service agents and agent assistants.

The type of implementation that you choose determines how you set up Voice Gateway. Learn more about each of these implementations in the following sections.

Self-service agents

With self-service agents, customers are directed through the voice gateway to interact with Watson services that you train to provide certain responses. You can optionally enable the Watson services to opt out to a call center agent by initiating a call transfer through the API.

The customer call is routed through the voice gateway, which orchestrates Watson services. If configured, the call can be routed to a human agent.

On the back end, a self-service agent is made of the following components, which each fulfill a different role:

Watson service orchestration

The following diagram shows how Voice Gateway orchestrates the various Watson services to enable a self-service agent. Within seconds, utterances flow between the services to result in a natural-sounding conversation with the caller.

Voice Gateway acts as a hub through which the caller and each Watson service communicate.

  1. The caller asks a question.
  2. The question is streamed to the Speech to Text service.
  3. A text utterance is returned.
  4. The text is sent to Watson Assistant as a message request.
  5. A message response is returned.
  6. The response text is sent to the Text to Speech service.
  7. Synthesized audio is returned.
  8. Voice Gateway streams the audio response to the caller.

Conversation flow through a service orchestration engine

For self-service agents, you can optionally include a service orchestration engine (SOE) to your environment, which enables you add your own layer of customization to the communication between Voice Gateway and the Watson Assistant service. Voice Gateway and Watson Assistant communicate through the Watson Assistant REST API, sending request data using only the MessageRequest method and receiving a corresponding JSON response. The service orchestration engine acts as a proxy for Watson Assistant, intercepting message requests and responses and modifying them by using third-party APIs.

Message requests and responses between Voice Gateway and Watson Assistant flow through a service orchestration engine, which modifies them.

For production deployments of Voice Gateway, you might want to incorporate a service orchestration engine for the following reasons:

To learn more about how to implement a service orchestration engine, see Connecting through a service orchestration engine.

Features for self-service agents

Agent assistants

The voice gateway provides the ability to transcribe caller and callee (e.g. contact-center agent) audio from an active phone call in real time using the SIPREC protocol. This capability requires a session border controller (SBC) that supports the ability to fork media out to the voice gateway, which is acting as a SIPREC Session Recording Server (SRS).

For agent assistants, the voice gateway forks the call to Watson services, which transcribe the conversation to provide feedback to a human agent.

Features for agent assistants


IBM Voice Gateway is one of several components in the overall architecture of self-service agents and agent assistants. The architecture and technologies that are used differ depending on your implementation. For self-service agents, callers can either connect directly to the voice gateway through a SIP trunk or indirectly through a session border controller (SBC).

Voice Gateway architecture

Voice Gateway is composed of two separate microservices, the SIP Orchestrator and the Media Relay. These microservices are delivered in the form of two separate Docker images.

The following diagram shows at a high level how these two microservices combine to provide the full functionality of IBM Voice Gateway:

The separate microservices in the voice gateway, the SIP Orchestrator and the Media Relay, communicate using APIs

Connecting to services using an MRCP server

In addition to using IBM® Speech to Text, IBM® Text to Speech, or the IBM® Voice Gateway Speech to Text Adapter, Voice Gateway also supports Media Resource Control Protocol Version 2 (MRCPv2) connections. You can use a mixture of third-party speech recognition and voice synthesizing services that are coordinated by Voice Gateway. See Configuring services with MRCPv2

Self-service agent architecture when using a SIP trunk

When connecting to a self-service agent through a SIP trunk, you must configure your SIP trunk to forward INVITE requests to the voice gateway based on its IP address and SIP port.

Calls flow through a SIP trunk to the voice gateway, which communicates with Watson services though the API.

SIP trunks can be used to quickly set up and test the voice gateway by calling your Watson services from the public telephone network. In this case you can simply deploy the voice gateway to a public cloud Docker container service, such as IBM® Cloud Kubernetes Service. On-premises enterprise integration typically requires that you configure a session border controller (SBC), which is discussed in the next section.

Self-service agent architecture when using an SBC

Session border controllers are typically used in cases when you want to enable customers to be transferred to live contact center agents. In a self-service agent where communications flow through a session border controller (SBC), you need to configure the SBC to forward calls to the voice gateway based on its IP address and SIP port. Note that to enable call transfers, the SBC must stay in the call path so that it can handle SIP REFER messages:

Calls flow to an SBC and then to the voice gateway, which communicates with Watson services through the API.

Agent assistant architecture when conferencing calls through an MCU

For agent assistants, media from the call between a customer and a human agent must be shared with Voice Gateway so that it can transcribe the call. One method of routing call media to Voice Gateway is to conference it into the ongoing call. Typically, this conferencing requires a multipoint control unit (MCU) or a participant in the call that can act as an MCU. Voice Gateway sends call audio for speech-to-text processing and then sends returned transcriptions to a configured reporting REST server.

The call is conferenced with the agent and Voice Gateway through a multipoint control unit. Voice Gateway listens in on the call, sends call audio for speech-to-text processing and then sends returned transcriptions to a REST server or other analytics gateway.

Agent assistant architecture when forking calls through an SBC

Another option for agent assistants is to fork calls from a session border controller (SBC) to Voice Gateway, which acts as a SIPREC Session Recording Server (SRS). Voice Gateway sends call audio for speech-to-text processing and then sends returned transcriptions to a REST server or other analytics gateway that supports REST APIs.

The call goes to an SBC, which forks the call to the voice gateway as it forwards the call to the human agent. Voice Gateway sends call audio for speech-to-text processing and then sends returned transcriptions to a REST server or other analytics gateway.

Supported languages

Voice Gateway supports the following languages with Watson speech services:

The IBM® Voice Gateway Speech to Text Adapter and IBM® Voice Gateway Text to Speech Adapter enable you to use additional languages for self-service agents through the Google Cloud Speech API and Google Cloud Text-to-Speech API. For more information, see Integrating third-party speech services. By using the speech to text adapter and text to speech adapter, you can extend your Voice Gateway deployment to support languages that include the following:

For a language to be supported, it must be supported by all services you integrate with Voice Gateway, including the third-party speech services and the IBM Watson™ Assistant service. For more information, see Supported languages for the Watson Assistant service.You can enable support for additional languages by creating custom speech adapters, which you can use to integrate third-party speech recognition (speech-to-text) and speech synthesis (text-to-speech) services. The speech adapter samples can help you get started with creating the speech adapters.

Note: IBM Voice Gateway does not provide licenses to any external services, including the Watson services or third-party speech services.

Supported protocols

System requirements

To deploy Voice Gateway in production environments, the following minimum software and hardware levels are required.

Table 1. Supported platforms and operating systems
Platform Operating system
Linux® 64-bit Red Hat Enterprise Linux (RHEL) 7.5 and 7.6
Ubuntu 16.04 LTS

Because IBM Voice Gateway is distributed as a set of Docker images, you can also deploy Voice Gateway on other platforms that support Docker and Kubernetes. For example, you can deploy Voice Gateway on 64-bit Windows environments using Docker for Windows and Docker Machine.

Table 2. Deployment environment requirements
Environment Minimum version
Docker Community Edition or Enterprise Edition Version 1.13 or later
Note: Swarm mode isn't supported
Kubernetes Version 1.7.3 or later
IBM Cloud Kubernetes Service N/A - Cloud-based service
Table 3. Virtualized hardware requirements
Hardware Minimum requirements
Virtual machine RAM 8 gigabytes (GB)
Virtual CPUs (vCPUs) 2 vCPU with x86-64 architecture at 2.4 GHz clock speed
Note: Varies based on expected number of concurrent calls and other factors
Storage 50 gigabytes (GB)
Note: Call recording and log storage settings significantly affect storage requirements

The exact virtualized hardware that is needed to reach your required level of performance varies greatly depending on several factors, including the expected number of concurrent calls, product configuration, and Watson Assistant dialog. If you need help planning your Voice Gateway environment, contact the product team as described in Getting help.