copyright: years: 2017, 2023 lastupdated: "2023-01-06"
Dynamically configuring the Speech to Text service or Speech to Text Adapter
By using the IBM® Voice Gateway API, you can dynamically configure the IBM® Speech to Text service or Speech to Text Adapter during a call. To change the configuration, define the vgwActSetSTTConfig
action in the output
of a node response in your Watson Assistant dialog tree. For more information about using the API, see Defining action tags and state variables.
Watson Speech to Text service instances that are not part of Premium plans, by default, log requests and their results to improve the service for future users. To prevent IBM usage of data in this way, see Data collection in the Watson Speech to Text API reference.
Note: Changing the configuration for the Speech to Text service causes the connection from the voice gateway to the Speech to Text service to disconnect and reconnect, which might cause the voice gateway to miss part of an utterance. Typically, the connection is reestablished while audio is streamed to the caller from the Watson Assistant response, which avoids missing any part of an utterance unless the caller barges in quickly.
See the following sections for examples of defining the vgwActSetSTTConfig
action:
Speech to Text service
This example shows configuration that you can add to the node response in your Watson Assistant dialog tree. The settings are transparently passed as JSON properties to the Speech to Text service.
{
"output": {
"vgwAction": {
"command": "vgwActSetSTTConfig",
"parameters": {
"credentials": {
"url": "https://api.us-south.speech-to-text.watson.cloud.ibm.com/instances/{instance_id}",
"apikey": "{apikey}",
"tokenServiceProviderUrl": "https://iam.cloud.ibm.com/identity/token"
},
"config": {
"x-watson-learning-opt-out": true,
"model": "en-US_NarrowbandModel",
"profanity_filter": true,
"smart_formatting": true,
"customization_id": "81d3630-ba58-11e7-aa4b-41bcd3f6f24d",
"acoustic_customization_id": "e4766090-ba51-11e7-be33-99bd3ac8fa93"
},
"confidenceScoreThreshold": 0.7,
"echoSuppression": true,
"bargeInResume": true,
"connectionTimeout": 30,
"requestTimeout": 15
}
}
}
}
JSON property | Description |
---|---|
credentials |
Credentials for the IBM® Speech to Text service. If not defined, the default credentials from the Media Relay configuration are used. You can also reduce call latency times by configuring the tokenAuthEnabled credential to enable token authentication for Version 1.0.0.5a and later. See Enabling user name and password based token authentication for Watson services. |
config |
Parameters for the Watson Speech to Text service when using narrowband audio. For a full list of parameters, see the WebSockets API reference for Watson Speech to Text Service. |
broadbandConfig |
Parameters for the Watson Speech To Text service when broadband audio is enabled. Required only when bandPreference is set to broadband . At minimum, the language model must be defined on the model property. For a full list of parameters, see the WebSockets API reference for Watson Speech to Text Service. Version 1.0.0.4 and later. |
bandPreference |
Defines which audio band to prefer when negotiating audio codecs in the session. Set to broadband to use broadband audio when possible. The default value is narrowband . Version 1.0.0.4 and later. |
confidenceScoreThreshold |
Confidence threshold of messages from the Speech to Text service. Messages with a confidence score that are under the threshold are not forwarded to Watson Assistant. The default value of 0 means that all responses will be used. The recommended values are between 0 and 1. |
echoSuppression |
Indicates whether to suppress results from Speech To Text that might occur from an echo of Text To Speech synthesis. Version 1.0.0.4c and later. |
bargeInResume |
Set to true to resume playing back audio after barge-in if the confidence score of the final utterance is lower than the threshold specified by the confidenceScoreThreshold property. Version 1.0.0.5 and later. |
connectionTimeout |
Time in seconds that Voice Gateway waits to establish a socket connection with the Watson Speech to Text service. If the time is exceeded, Voice Gateway reattempts to connect with the Watson Speech to Text service. If the service still can't be reached, the call fails. Version 1.0.0.5 and later. |
requestTimeout |
Time in seconds that Voice Gateway waits to establish a speech recognition session with the Watson Speech to Text service. If the time is exceeded, Voice Gateway reattempts to connect with the Watson Speech to Text service. If the service still can't be reached, the call fails. Version 1.0.0.5 and later. |
updateMethod |
Optional. Specifies the update strategy to choose when setting the speech configuration. Possible values:
See Using updateMethod . Version 1.0.0.7 and later. |
The parameters that you can set under the config
and broadbandConfig
JSON properties reflect the parameters that are made available by the Speech to Text WebSocket interface.
The WebSocket API sends two types of parameters: query parameters, which are sent when Voice Gateway connects to the service, and message parameters, which are sent as JSON after the connection is established. For example,
model
and customization_id
are query parameters, and smart_formatting
is a WebSocket message parameter. For a full list of parameters, see the WebSockets API reference for Watson Speech to Text Service.
You can define the following query parameters for the Media Relay's connection to the Speech To Text service. Any other parameter that you define under config
or broadbandConfig
is passed through on the WebSocket message
request.
model
customization_id
acoustic_customization_id
version
- Version 1.0.0.4c and laterx-watson-learning-opt-out
Note: The following parameters from the Speech to Text service can't be modified because they have fixed values that are used by the Media Relay.
action
content-type
interim_results
continuous
inactivity_timeout
Example: Setting the Speech to Text language model to Spanish (es-ES_NarrowbandModel)
In this example, the language model is switched to Spanish and smart formatting is enabled. Because the credentials
property isn't defined, the Media Relay will use the credentials defined through the Media Relay configuration (WATSON_STT_URL
, WATSON_STT_USERNAME
, and WATSON_STT_PASSWORD
)
{
"output": {
"vgwAction": {
"command": "vgwActSetSTTConfig",
"parameters": {
"config": {
"model": "es-ES_NarrowbandModel",
"smart_formatting": true
}
}
}
}
}
Using updateMethod
You can use the updateMethod
property in dynamic configuration to define how changes to the configuration are , by either replacing the configuration or merging new configuration properties, and specifying whether these changes
occur for the duration of the call or one conversation turn.
JSON property | Description |
---|---|
replace | Replaces the configuration for the duration of the call. |
replaceOnce | Replaces the configuration once, so the configuration is used for only the following conversation turn. Then, it reverts to the previous configuration. |
merge | Merges the configuration with the existing configuration for the duration of the call. |
mergeOnce | Merges the configuration for one turn of the conversation, and then reverts to the previous configuration. |
Example: Using the mergeOnce
method to update the Speech to Text recognizeBody
property
The following example shows the grammar for the mergeOnce
method in the Watson Assistant dialog. By setting the updateMethod
to mergeOnce
, Watson Assistant uses the action tag, vgwActSetSTTConfig
to append the recognizeBody
property to the STT
config in the Voice Gateway JSON configuration file. These properties are used by Voice Gateway until the next turn event.
{
"output": {
"text": "I can speak Spanish now",
"vgwAction": {
"command": "vgwActSetSTTConfig",
"parameters": {
"updateMethod": "mergeOnce",
"config": {
"recognizeBody": {
"contentType": "application/sgrs",
"body": "#ABNF 1.0 ISO-8859-1;\nlanguage en-US;\nmode voice;\nroot $pattern;\n$pattern = $alphanum <1-> ;\n$alphanum = $digit | $letter\n$digit = zero | one | two | three | four | five | six | seven | eight | nine;\n$letter = \\"A.\\" | \\"B.\\" | \\"C.\\" | \\"D.\\" | \\"E.\\" | \\"F.\\" | \\"G.\\" | \\"H.\\" | \\"I.\\" | \\"J.\\" | \\"K.\\" | \\"L.\\" | \\"M.\\" | \\"N.\\" | \\"O.\\" | \\"P.\\" | \\"Q.\\" | \\"R.\\"| \\"S.\\" | \\"T.\\" | \\"U.\\" |\\"V.\\" | \\"W.\\" | \\"X.\\" | \\"Y.\\" | \\"Z.\\" ;\n"
}
}
}
}
}
}
Updating fields that are not root level
When configuring dynamically from Watson Assistant, it's important to note that only the root level fields, such as config
or bargeInResume
, are updated. If they are omitted from the action, the original configuration
settings persist. You can use the different updateMethod
properties for merge
and mergeOnce
to merge config
fields with the existing configuration.
Voice Gateway Speech to Text Adapter
The following example for Cloud Speech API shows configuration that you can add to the node response in your Watson Assistant dialog tree. The settings are transparently passed as JSON properties to the Cloud Speech API.
{
"output": {
"vgwAction": {
"command": "vgwActSetSTTConfig",
"parameters": {
"config": {
"languageCode": "es-ES",
"profanityFilter": true,
"maxAlternatives": 2,
"speechContexts": [{
"phrases": [ "Si", "Por supuesto", "Claro", "Si por favor"]
}]
},
"thirdPartyCredentials": {
"type": "service_account",
"project_id": "my_google_project",
"private_key_id": "d2f36f96cb0c58309a5eba101cef4af0663d9465",
"private_key": "-----BEGIN PRIVATE ... \n-----END PRIVATE KEY-----\n",
"client_email": "developer1@my_google_project.iam.gserviceaccount.com",
"client_id": "100033083330209022330835",
"auth_uri": "https://accounts.google.com/o/oauth2/auth",
"token_uri": "https://accounts.google.com/o/oauth2/token",
"auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
"client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/developer1@my_google_project.iam.gserviceaccount.com"
}
}
}
}
}
The JSON properties define the values that will change. If a property isn't defined, the existing value is used.
JSON property | Description |
---|---|
config | Parameters for the Google Cloud Speech API RecognitionConfig request. For a full list of parameters, see the RecognitionConfig API documentation. |
thirdPartyCredentials | Contents of a Google Cloud project service account JSON file. If this property is omitted, the credentials specified on the GOOGLE_APPLICATION_CREDENTIALS environment variable are used. |
Note: The following fields for RecognitionConfig
in the Cloud Speech API can't be modified because they have fixed values that are used by the Speech To Text Adapter.
encoding
sample_rate_hertz
Deprecated: Configuring the Speech to Text service by defining state variables
In Version 1.0.0.2, configuring Watson speech service by defining state variables was deprecated in favor of the action tags described in the previous sections.
Important: Although the state variables continue to function, you can't define these deprecated state variables and the action tags within a node. Your Watson Assistant dialog can contain a mixture of action tags and deprecated state variables, but the JSON definition for each node can contain only one or the other.
{
"context": {
"vgwSTTConfigSettings": {
"credentials": {
"url": "https://api.us-south.speech-to-text.watson.cloud.ibm.com/instances/{instance_id}",
"apikey": "{apikey}",
"tokenServiceProviderUrl": "https://iam.cloud.ibm.com/identity/token"
},
"config": {
"x-watson-learning-opt-out": true,
"model": "en-US_NarrowbandModel",
"profanity_filter": true,
"smart_formatting": true
},
"confidenceScoreThreshold": 0.7
}
}
}
JSON property | Description |
---|---|
credentials |
Credentials for the Watson Speech to Text service. If not defined, the default credentials from the Media Relay configuration are used. |
config |
Parameters for the Watson Speech to Text service. See the WebSockets API reference for Watson Speech to Text Service. |
confidenceScoreThreshold |
Confidence threshold of messages from the Speech to Text service. Messages with a confidence score that are under the threshold are not forwarded to Watson Assistant. The default value of 0 means that all responses will be used. The recommended values are between 0 and 1. |