Contribute in GitHub:

Handling Watson Assistant response types

The Voice Gateway supports the following IBM Watson™ Assistant response types:

text
image
option
suggestion
search

Each response type is specified by using a different set of JSON properties. The properties included for each response vary depending on the response type.

Note: The image, option, and suggestion response types were introduced in Voice Gateway 1.0.6.

Response type `text`

The text response type shows an ordinary text response.

{
  "output": {
    "generic": [
      {
        "response_type": "text",
        "text": "OK, you want to fly to Boston next Monday."
      }
    ]
  }
}

Response type `image`

The image response type shows an image.

{
  "output": {
    "generic":[
      {
        "response_type": "image",
        "source": "http://example.com/image.jpg"
      }
    ]
  }
}

The image response type can be used when SMS integration is enabled.

In the following example, the text message is sent to the user over the voice channel, while the image is sent over the SMS channel.

{
  "output": {
    "generic": [
      {
        "response_type": "text",
        "text": "This is a test message."
      },
      {
        "response_type": "image",
        "source": "<url>"
      }
    ]
  }
}

The multitype response is equivalent to the following action sequence:

{
  "output": {
    "vgwActionSequence": [
      {
        "command": "vgwActPlayText",
        "parameters": {
          "text": [
            "This is a test message."
          ]
        }
      },
      {
        "command": "vgwActSendSMS",
        "parameters": {
          "mediaURL": "http://example.com/image.jpg"
        }
      }
    ]
  }
}

If SMS integration is not enabled, the image response type is ignored in the Voice Gateway.

Response type `suggestion`

The suggestion response type is used by the disambiguation feature to suggest possible matches when it isn’t clear what the user wants to do.

When disambiguation is enabled, your assistant asks the user for help when more than one dialog node can respond to the user's input. Instead of guessing which node to process, your assistant lists the top possible nodes, and asks the user to pick the right one.

More more information, see Disambiguation.

Possible matching dialog nodes are listed using a suggestion response:

{
  "output": {
    "generic": [
      {
        "response_type": "suggestion",
        "title": "Please choose one of the following options:",
        "suggestions": [
          {
            "label": "I'd like to order a drink.",
            "value": {
              "intents": [
                {
                  "intent": "order_drink",
                  "confidence": 0.7330395221710206
                }
              ],
              "entities": [],
              "input": {
                "suggestion_id": "576aba3c-85b9-411a-8032-28af2ba95b13",
                "text": "I want to place an order"
              }
            },
            "output": {
              "text": [
                "I'll get you a drink."
              ],
              "generic": [
                {
                  "response_type": "text",
                  "text": "I'll get you a drink."
                }
              ],
              "nodes_visited_details": [
                {
                  "dialog_node": "node_1_1547675028546",
                  "title": "order drink",
                  "user_label": "I'd like to order a drink.",
                  "conditions": "#order_drink"
                }
              ]
            },
            "source_dialog_node": "root"
          },
          {
            "label": "I need a drink refill.",
            "value": {
              "intents": [
                {
                  "intent": "refill_drink",
                  "confidence": 0.2529746770858765
                }
              ],
              "entities": [],
              "input": {
                "suggestion_id": "6583b547-53ff-4e7b-97c6-4d062270abcd",
                "text": "I need a drink refill"
              }
            },
            "output": {
              "text": [
                "I'll get you a refill."
              ],
              "generic": [
                {
                  "response_type": "text",
                  "text": "I'll get you a refill."
                }
              ],
              "nodes_visited_details": [
                {
                  "dialog_node": "node_2_1547675097178",
                  "title": "refill drink",
                  "user_label": "I need a drink refill.",
                  "conditions": "#refill_drink"
                }
              ]
            },
            "source_dialog_node": "root"
          }
        ]
      }
    ]
  }
}

The list of suggestions is introduced using the text specified by the title attribute of the suggestion response.

Each suggestion includes a label that can be played to the user and a value that specifies the input to send to the assistant if the user chooses the corresponding suggestion. By default, Voice Gateway adds a number to the text specified in a label. Suggestions are numbered sequentially and played to the user in the order in which they appear in the list. The user can use dual-tone multifrequency (DTMF) signals, which are the tones that are generated by touching telephone keys, or say the respective number to choose one of the available options.

You can configure the text that will be prepended to each label using the vgwActSetDisambiguationConfig action. In the prefixText attribute, use %s to represent the number corresponding to the suggestion; this is replaced with the actual number at run time.

For example, if label is configured as follows:

"label": "I'd like to order a drink."

By default, the Voice Gateway plays the following to the user:

1. I'd like to order a drink.

If prefixText is set to "Press or say %s for.", then Voice Gateway plays the following to the user:

Press or say 1 for. I'd like to order a drink.

The following code snippet shows a more complete example:

{
  "output": {
    "vgwAction": {
      "command": "vgwActSetDisambiguationConfig",
      "parameters": {
        "prefixText": "Press or say %s for.",
        "matchWord": ["one","two", "three","four","five"],
        "disableSpeech" : true
      }
    }
  }
}

Use the matchWord attribute to match Speech to Text utterances to one of the suggestions in the list. In other languages, the user must specify on the first conversation turn which match words to use in the preferred language. One matching word can be configured per suggestion. For example, if there are three suggestions in the list and the matchWord attribute is set to ["one","two", "three"], the word one matches the first suggestion in the list, the word two matches the second suggestion in the list, and so on.

Both voice or DTMF can be used to choose a suggestion. You can use the disableSpeech attribute to disable voice and accept only DTMF for choosing a suggestion.

Note: If a configuration is specified using the vgwActSetDisambiguationConfig action, the same settings are used each time disambiguation is triggered. Specify this action in a root node.

Response type `option`

The option response type allows the user to select from a list of options, and then sends input to the assistant based on the selected option:

{
  "output": {
    "generic": [
      {
        "response_type": "option",
        "title": "Available options",
        "description": "Please select one of the following options:",
        "preference": "button",
        "options": [
          {
            "label": "Option 1",
            "value": {
              "input": {
                "text": "option 1"
              }
            }
          },
          {
            "label": "Option 2",
            "value": {
              "input": {
                "text": "option 2"
              }
            }
          }
        ]
      }
    ]
  }
}

The Voice Gateway uses the same numbering approach for the option response as it uses for the suggestion response. Similar to a suggestion response, the text specified by the title attribute is played first, followed by the text specified by the label attributes.

{
  "output": {
    "vgwAction": {
      "command": "vgwActSetOptionsConfig",
      "parameters": {
        "prefixText": "Press or say %s for.",
        "matchWord": ["one","two", "three","four","five"],
        "disableSpeech" : true
      }
    }
  }
}

When the vgwActSetOptionsConfig is specified, the same settings are used for all option responses.

Response type `search`

The search response type returns a list of search results from a search skill. The response includes an introductory header and an array of search results:

{
  "output": {
    "generic": [
      {
        "response_type": "search",
        "header": "I found the following information that might be helpful.",
        "results": [
          {
            "title": "About",
            "body": "IBM Watson Assistant is a cognitive bot that you can customize for your business needs, and deploy across multiple channels to bring help to your customers where and when they need it.",
            "url": "https://cloud.ibm.com/docs/assistant?topic=assistant-index",
            "id": "6682eca3c5b3778ccb730b799a8063f3",
            "result_metadata": {
              "confidence": 0.08401551980328191,
              "score": 0.73975396
            },
            "highlight": {
              "Shortdesc": [
                "IBM <em>Watson</em> <em>Assistant</em> is a cognitive bot that you can customize for your business needs, and deploy across multiple channels to bring help to your customers where and when they need it."
              ],
              "url": [
                "https://cloud.ibm.com/docs/<em>assistant</em>?topic=<em>assistant</em>-index"
              ],
              "body": [
                "IBM <em>Watson</em> <em>Assistant</em> is a cognitive bot that you can customize for your business needs, and deploy across multiple channels to bring help to your customers where and when they need it."
              ]
            }
          }
        ]
      }
    ]
  },
  "context": {
    "global": {
      "system": {
        "turn_count": 1
      },
      "session_id": "58e1b04e-f4bb-469a-9e4c-dffe1d4ebf23"
    }
  }
}

For a search response, Voice Gateway plays the introductory text specified by the header property, followed by the body text from the search result with the highest confidence score. Only the text of the body field is played to the user. Make sure the search skill configuration maps the body field to the text you want to be included in the response played to the user.

If the results array contains more than one search result, only the result with the highest confidence score is used.

For the above example, Voice Gateway plays the following to the user:

I found the following information that might be helpful. IBM Watson Assistant is a cognitive bot that you can customize for your business needs, and deploy across multiple channels to bring help to your customers where and when they need it.

Important: The entire response played to the user (the header text and the body text of the first result) must be no more than 5000 characters in length. If this text is longer than 5000 characters, no response is played.

Handling Watson Assistant response types

Response type text

Response type image

Response type suggestion

Response type option

Response type search

Response type `text`

Response type `image`

Response type `suggestion`

Response type `option`

Response type `search`