copyright: years: 2017, 2023 lastupdated: "2023-01-05"
Configuring barge-in for calls
Barge-in enables callers to IBM® Voice Gateway to interrupt Watson™ during a call to a self-service agent. When barge-in occurs, the audio from the Watson Text to Speech service stops playing, and Voice Gateway waits for an utterance from the caller to be sent to Watson Assistant.
About barge-in
-
Speech barge-in occurs when the Media Relay receives an initial hypothesis or a final utterance from the Speech to Text service or a response from Watson Assistant while audio from the Text to Speech service is playing. In the period between the initial hypothesis and the final utterance, the Media Relay discards or cancels any new or active audio playback.
Final utterances from the Speech to Text service include a confidence score, which indicates how confident the service is that the transcription matches what the caller said. Any hypothesis from the Speech to Text service that isn't final does not include a confidence score. Typically, a hypothesis might not be final because it's a longer utterance, which therefore immediately results in a barge-in regardless of confidence score.
-
DTMF barge-in occurs when the caller provides dual-tone multi-frequency (DTMF) input by pressing a key while audio from the Text to Speech service is playing. See Disabling DTMF barge-in or Collecting dual-tone multi-frequency (DTMF) responses.
Resuming audio playback after barge-in
By default, Voice Gateway stops any audio playback when barge-in occurs. You can configure Voice Gateway to resume audio playback after barge-in if the confidence score of the final utterance is under the confidence score threshold. Enabling this setting can reduce the impact of unintended barge-ins from outside noise and unintelligible speech, which results in a more natural call flow. Resuming audio playback after barge-in is available in Version 1.0.0.5 and later.
If this setting is enabled, when the Media Relay receives hypothesis messages from the Speech to Text service, it pauses the audio that is playing back. When the Media Relay receives the final utterance, it evaluates the confidence score against the confidence score threshold. If the score is higher than the threshold, the Media Relay cancels audio playback. If the score is lower than the threshold, the Media Relay resumes playing the audio.
-
Single-tenant environment: In the Media Relay configuration, set the
BARGE_IN_RESUME
environment variable totrue
.For example, on Docker:
- BARGE_IN_RESUME=true
-
Multi-tenant JSON configuration: In the JSON configuration file, in the
stt
object for each tenant, set thebargeInResume
property totrue
."stt": { "credentials": { ... }, "config": { ... }, "bargeInResume": true }
-
Dynamic configuration: You can dynamically configure this setting and other Speech to Text configuration during a call by setting the
vgwActSetSTTConfig
API action within the Watson Assistant dialog. For more information, see Dynamically configuring Watson speech services.
Disabling speech barge-in by disabling speech-to-text processing
You can completely disable speech barge-in by disabling speech-to-text processing while Voice Gateway plays back audio. When speech-to-text processing is disabled, callers can't barge in by speaking, but they can barge in by using DTMF tones unless it's otherwise disabled.
You can disable speech barge-in for all calls by setting it in the Voice Gateway configuration, or dynamically disable and enable barge-in during a call by using API action tags. This configuration is available in Version 1.0.0.4c and later.
-
Single-tenant environment: In the SIP Orchestrator configuration, set the
DISABLE_STT_DURING_PLAYBACK
environment variable totrue
.For example, on Docker:
- DISABLE_STT_DURING_PLAYBACK=true
-
Using JSON configuration or in a multi-tenant environment: In the JSON configuration file, for each tenant where you want to disable speech barge-in, set the
disableSTTDuringPlayback
property totrue
."tenants": [ { "tenantURI": "2345556789", "description": "Voice Gateway Demo US", "whitelistFromUri" : "8765554321", "disableSTTDuringPlayback" : "true", "conversation": { "url": "https://api.us-south.assistant.watson.cloud.ibm.com/instances/{instance_id}", "workspaceID": "a23de67h-e527-40d5-a867-5c0ce9e72d0d", "apikey":"PqxxxxrbuJ2ZxxxjZfyYe5oNRdY6AWdTa_xxxxEHVwCq" } ... } ]
-
Dynamic configuration: You can disable and enable speech barge-in during a call by setting the
vgwActDisableSTTDuringPlayback
andvgwActEnableSTTDuringPlayback
action tags in the JSON definition of a Watson Assistant dialog node. The state is preserved for the duration of the call, so you can set thevgwActDisableSTTDuringPlayback
action tag early in the Watson Assistant dialog to disable speech barge-in. You can later choose to enable it by setting thevgwActEnableSTTDuringPlayback
action tag.If speech-to-text processing is disabled explicitly by using the
vgwActPauseSTT
action, then Voice Gateway does not resume speech-to-text processing after the playback transaction completes until it's reenabled by thevgwActUnPauseSTT
action. For more information about using action tags, see API for self-service agents.{ "output": { "vgwAction": { "command": "vgwActDisableSTTDuringPlayback" } } }
Disabling speech barge-in from initial hypotheses or final utterances
If you want to reduce the amount of speech barge-in but still allow some speech-to-text processing, you can disable speech barge-in by setting the vgwActDisableSpeechBargeIn
action tag in the JSON definition of a dialog node. The
vgwActDisableSpeechBargeIn
action disables barge-in from initial hypotheses and final utterances from the Speech to Text service, which mostly affects early rapid barge-in. However, barge-in can still occur if Voice Gateway receives
a response back from Watson Assistant that is synthesizes into audio.
{
"output": {
"vgwAction": {
"command": "vgwActDisableSpeechBargeIn"
}
}
}
After the variable is set during a conversation session, the same value is used in all subsequent transactions. To re-enable barge-in, set the vgwActEnableSpeechBargeIn
action tag within the dialog.
Disabling DTMF barge-in
You can disable DTMF barge-in by setting the vgwActDisableDTMFBargeIn
action tag in the JSON node definition. To re-enable barge-in, set the vgwActEnableDTMFBargeIn
action tag within the dialog.
By using action sequences, you can disable or enable both speech and DTMF barge-in at the same time. For example, the following action sequence disables all speech barge-in and DTMF barge-in and plays the text as audio.
{
"output": {
"vgwActionSequence": [
{
"command": "vgwActDisableSTTDuringPlayback"
},
{
"command": "vgwActDisableDTMFBargeIn"
},
{
"command": "vgwActPlayText",
"parameters": {
"text": [
"Welcome to our service. Please listen carefully."
]
}
}
]
}
}