Contribute in GitHub:

Improving speech-to-text processing accuracy

Noise from the caller audio, such as background noise and echoes from Text to Speech playback, can affect the accuracy of the speech-to-text processing and cause unwanted barge-ins. You can configure IBM® Voice Gateway to account for noisy environments and echoes.

Configuring the Speech to Text confidence score threshold

Each utterance that the Speech to Text service processes is assigned a confidence score, which indicates how confident the service is that the identified text matches the audio input. You can configure the Media Relay to discard and ignore Speech to Text utterances with a confidence score that is under a certain threshold.

By default, the confidence score threshold is 0, which means that all responses are used. Recommended values are between 0 and 1.

Single-tenant environment: In the Media Relay configuration, define the threshold value on the WATSON_STT_CONFIDENCE_SCORE_THRESHOLD environment variable.
```
- WATSON_STT_CONFIDENCE_SCORE_THRESHOLD=0.2
```

Multi-tenant JSON configuration: In the multi-tenant JSON configuration file, you can define distinct confidence thresholds for each tenant. For each tenant, define the confidenceScoreThreshold property in the stt object.

...
"stt": {
"credentials": {
"url": "https://api.us-south.speech-to-text.watson.cloud.ibm.com/instances/{instance_id}",
"apikey": "{apikey}",
"tokenServiceProviderUrl": "https://iam.cloud.ibm.com/identity/token"
},
"config": {
  "model": "en-US_NarrowbandModel",
  "profanity_filter": true,
  "smart_formatting": true
},
"confidenceScoreThreshold": 0.2
}
...

Dynamic configuration: You can dynamically change the confidence score threshold during a call by using the vgwActSetSTTConfig action tag from the Voice Gateway API to set the confidenceScoreThreshold parameter. For more information, see Dynamically configuring Watson services.
```
{
"output": {
  "vgwAction": {
    "command": "vgwActSetSTTConfig",
    "parameters": {
       "confidenceScoreThreshold": 0.7
       }
  }
}
}
```

Suppressing echoes from Text to Speech playback

During a call, audio from the Text to Speech service that Voice Gateway plays back might be audible through the caller's telephone line. This echoed audio can be interpreted as audio from the caller and processed by the Speech to Text service, resulting in barge-ins or invalid responses from Watson Assistant. To reduce these occurrences, you can enable echo suppression, which ignores any utterances that occur immediately after Text to Speech audio begins to play. Echo suppression is supported in Version 1.0.0.4c and later.

You can enable echo suppression for all calls by specifying it the Voice Gateway configuration, or enable or disable it dynamically during a call.

Single-tenant environment: In the Media Relay configuration, set the ECHO_SUPPRESSION environment variable to true.

For example, on Docker:
```
- ECHO_SUPPRESSION=true
```

Multi-tenant JSON configuration: In the multi-tenant JSON configuration file, set the echoSuppression property to true under the stt object.

"stt": {
"credentials": {
"url": "https://api.us-south.speech-to-text.watson.cloud.ibm.com/instances/{instance_id}",
"apikey": "{apikey}",
"tokenServiceProviderUrl": "https://iam.cloud.ibm.com/identity/token"
},
"config": {
  "model": "en-US_NarrowbandModel",
  "profanity_filter": true,
  "smart_formatting": true
},
"confidenceScoreThreshold": 0.2,
"echoSuppression": true
}

Dynamic configuration: You can dynamically configure echo suppression during a call by using the vgwActSetSTTConfig action tag from the Voice Gateway API to set the echoSuppression parameter. For more information, see Dynamically configuring Watson services.
```
{
"output": {
  "vgwAction": {
    "command": "vgwActSetSTTConfig",
    "parameters": {
       "echoSuppression": true
     }
  }
}
}
```