GitHubContribute in GitHub: Edit online

copyright: years: 2017, 2023 lastupdated: "2023-01-05"


Configuring barge-in for calls

Barge-in enables callers to IBM® Voice Gateway to interrupt Watson™ during a call to a self-service agent. When barge-in occurs, the audio from the Watson Text to Speech service stops playing, and Voice Gateway waits for an utterance from the caller to be sent to Watson Assistant.

About barge-in

  • Speech barge-in occurs when the Media Relay receives an initial hypothesis or a final utterance from the Speech to Text service or a response from Watson Assistant while audio from the Text to Speech service is playing. In the period between the initial hypothesis and the final utterance, the Media Relay discards or cancels any new or active audio playback.

    Final utterances from the Speech to Text service include a confidence score, which indicates how confident the service is that the transcription matches what the caller said. Any hypothesis from the Speech to Text service that isn't final does not include a confidence score. Typically, a hypothesis might not be final because it's a longer utterance, which therefore immediately results in a barge-in regardless of confidence score.

  • DTMF barge-in occurs when the caller provides dual-tone multi-frequency (DTMF) input by pressing a key while audio from the Text to Speech service is playing. See Disabling DTMF barge-in or Collecting dual-tone multi-frequency (DTMF) responses.

Resuming audio playback after barge-in

By default, Voice Gateway stops any audio playback when barge-in occurs. You can configure Voice Gateway to resume audio playback after barge-in if the confidence score of the final utterance is under the confidence score threshold. Enabling this setting can reduce the impact of unintended barge-ins from outside noise and unintelligible speech, which results in a more natural call flow. Resuming audio playback after barge-in is available in Version 1.0.0.5 and later.

If this setting is enabled, when the Media Relay receives hypothesis messages from the Speech to Text service, it pauses the audio that is playing back. When the Media Relay receives the final utterance, it evaluates the confidence score against the confidence score threshold. If the score is higher than the threshold, the Media Relay cancels audio playback. If the score is lower than the threshold, the Media Relay resumes playing the audio.

  • Single-tenant environment: In the Media Relay configuration, set the BARGE_IN_RESUME environment variable to true.

    For example, on Docker:

    - BARGE_IN_RESUME=true
    
  • Multi-tenant JSON configuration: In the JSON configuration file, in the stt object for each tenant, set the bargeInResume property to true.

    "stt": {
      "credentials": {
        ...
      },
      "config": {
        ...
      },
      "bargeInResume": true
    }
    
  • Dynamic configuration: You can dynamically configure this setting and other Speech to Text configuration during a call by setting the vgwActSetSTTConfig API action within the Watson Assistant dialog. For more information, see Dynamically configuring Watson speech services.

Disabling speech barge-in by disabling speech-to-text processing

You can completely disable speech barge-in by disabling speech-to-text processing while Voice Gateway plays back audio. When speech-to-text processing is disabled, callers can't barge in by speaking, but they can barge in by using DTMF tones unless it's otherwise disabled.

You can disable speech barge-in for all calls by setting it in the Voice Gateway configuration, or dynamically disable and enable barge-in during a call by using API action tags. This configuration is available in Version 1.0.0.4c and later.

  • Single-tenant environment: In the SIP Orchestrator configuration, set the DISABLE_STT_DURING_PLAYBACK environment variable to true.

    For example, on Docker:

    - DISABLE_STT_DURING_PLAYBACK=true
    
  • Using JSON configuration or in a multi-tenant environment: In the JSON configuration file, for each tenant where you want to disable speech barge-in, set the disableSTTDuringPlayback property to true.

      "tenants": [
        {
          "tenantURI": "2345556789",
          "description": "Voice Gateway Demo US",
          "whitelistFromUri" : "8765554321",
          "disableSTTDuringPlayback" : "true",
          "conversation": {
            "url": "https://api.us-south.assistant.watson.cloud.ibm.com/instances/{instance_id}",
            "workspaceID": "a23de67h-e527-40d5-a867-5c0ce9e72d0d",
            "apikey":"PqxxxxrbuJ2ZxxxjZfyYe5oNRdY6AWdTa_xxxxEHVwCq"
          }
          ...
        }
      ]
    
  • Dynamic configuration: You can disable and enable speech barge-in during a call by setting the vgwActDisableSTTDuringPlayback and vgwActEnableSTTDuringPlayback action tags in the JSON definition of a Watson Assistant dialog node. The state is preserved for the duration of the call, so you can set the vgwActDisableSTTDuringPlayback action tag early in the Watson Assistant dialog to disable speech barge-in. You can later choose to enable it by setting the vgwActEnableSTTDuringPlayback action tag.

    If speech-to-text processing is disabled explicitly by using the vgwActPauseSTT action, then Voice Gateway does not resume speech-to-text processing after the playback transaction completes until it's reenabled by the vgwActUnPauseSTT action. For more information about using action tags, see API for self-service agents.

    {
      "output": {
        "vgwAction": {
          "command": "vgwActDisableSTTDuringPlayback"
        }
      }
    }
    

Disabling speech barge-in from initial hypotheses or final utterances

If you want to reduce the amount of speech barge-in but still allow some speech-to-text processing, you can disable speech barge-in by setting the vgwActDisableSpeechBargeIn action tag in the JSON definition of a dialog node. The vgwActDisableSpeechBargeIn action disables barge-in from initial hypotheses and final utterances from the Speech to Text service, which mostly affects early rapid barge-in. However, barge-in can still occur if Voice Gateway receives a response back from Watson Assistant that is synthesizes into audio.

{
  "output": {
    "vgwAction": {
      "command": "vgwActDisableSpeechBargeIn"
    }
  }
}

After the variable is set during a conversation session, the same value is used in all subsequent transactions. To re-enable barge-in, set the vgwActEnableSpeechBargeIn action tag within the dialog.

Disabling DTMF barge-in

You can disable DTMF barge-in by setting the vgwActDisableDTMFBargeIn action tag in the JSON node definition. To re-enable barge-in, set the vgwActEnableDTMFBargeIn action tag within the dialog.

By using action sequences, you can disable or enable both speech and DTMF barge-in at the same time. For example, the following action sequence disables all speech barge-in and DTMF barge-in and plays the text as audio.

{
  "output": {
    "vgwActionSequence": [
      {
        "command": "vgwActDisableSTTDuringPlayback"
      },
      {
        "command": "vgwActDisableDTMFBargeIn"
      },
      {
        "command": "vgwActPlayText",
        "parameters": {
          "text": [
            "Welcome to our service. Please listen carefully."
          ]
        }
      }
    ]
  }
}