copyright: years: 2018, 2023 lastupdated: "2023-01-07"
Improving speech-to-text processing accuracy
Noise from the caller audio, such as background noise and echoes from Text to Speech playback, can affect the accuracy of the speech-to-text processing and cause unwanted barge-ins. You can configure IBM® Voice Gateway to account for noisy environments and echoes.
Configuring the Speech to Text confidence score threshold
Each utterance that the Speech to Text service processes is assigned a confidence score, which indicates how confident the service is that the identified text matches the audio input. You can configure the Media Relay to discard and ignore Speech to Text utterances with a confidence score that is under a certain threshold.
By default, the confidence score threshold is 0
, which means that all responses are used. Recommended values are between 0
and 1
.
-
Single-tenant environment: In the Media Relay configuration, define the threshold value on the
WATSON_STT_CONFIDENCE_SCORE_THRESHOLD
environment variable.- WATSON_STT_CONFIDENCE_SCORE_THRESHOLD=0.2
-
Multi-tenant JSON configuration: In the multi-tenant JSON configuration file, you can define distinct confidence thresholds for each tenant. For each tenant, define the
confidenceScoreThreshold
property in thestt
object.... "stt": { "credentials": { "url": "https://api.us-south.speech-to-text.watson.cloud.ibm.com/instances/{instance_id}", "apikey": "{apikey}", "tokenServiceProviderUrl": "https://iam.cloud.ibm.com/identity/token" }, "config": { "model": "en-US_NarrowbandModel", "profanity_filter": true, "smart_formatting": true }, "confidenceScoreThreshold": 0.2 } ...
-
Dynamic configuration: You can dynamically change the confidence score threshold during a call by using the
vgwActSetSTTConfig
action tag from the Voice Gateway API to set theconfidenceScoreThreshold
parameter. For more information, see Dynamically configuring Watson services.{ "output": { "vgwAction": { "command": "vgwActSetSTTConfig", "parameters": { "confidenceScoreThreshold": 0.7 } } } }
Suppressing echoes from Text to Speech playback
During a call, audio from the Text to Speech service that Voice Gateway plays back might be audible through the caller's telephone line. This echoed audio can be interpreted as audio from the caller and processed by the Speech to Text service, resulting in barge-ins or invalid responses from Watson Assistant. To reduce these occurrences, you can enable echo suppression, which ignores any utterances that occur immediately after Text to Speech audio begins to play. Echo suppression is supported in Version 1.0.0.4c and later.
You can enable echo suppression for all calls by specifying it the Voice Gateway configuration, or enable or disable it dynamically during a call.
-
Single-tenant environment: In the Media Relay configuration, set the
ECHO_SUPPRESSION
environment variable totrue
.For example, on Docker:
- ECHO_SUPPRESSION=true
-
Multi-tenant JSON configuration: In the multi-tenant JSON configuration file, set the
echoSuppression
property totrue
under thestt
object."stt": { "credentials": { "url": "https://api.us-south.speech-to-text.watson.cloud.ibm.com/instances/{instance_id}", "apikey": "{apikey}", "tokenServiceProviderUrl": "https://iam.cloud.ibm.com/identity/token" }, "config": { "model": "en-US_NarrowbandModel", "profanity_filter": true, "smart_formatting": true }, "confidenceScoreThreshold": 0.2, "echoSuppression": true }
-
Dynamic configuration: You can dynamically configure echo suppression during a call by using the
vgwActSetSTTConfig
action tag from the Voice Gateway API to set theechoSuppression
parameter. For more information, see Dynamically configuring Watson services.{ "output": { "vgwAction": { "command": "vgwActSetSTTConfig", "parameters": { "echoSuppression": true } } } }