GitHubContribute in GitHub: Edit online

copyright: years: 2017, 2023 lastupdated: "2023-01-07"


Reporting transcription events from Voice Gateway

For both self-service agents and agent assistants, IBM® Voice Gateway provides the ability to access the transcription text from the call in real time. Transcribing calls can be valuable in many scenarios, including:

  • Monitoring conversations as they occur
  • Logging conversations for later analysis, such as to fine-tune your Watson services for a better customer experience

You can configure Voice Gateway to generate transcription messages as REST events and publish them to a configured REST server. This method is available in Version 1.0.0.2 and later. In Versions 1.0.0.6 and later, you can also configure Voice Gateway to publish transcription events directly to IBM Cloudant. Each transcription event contains the utterance text, confidence score, and session information. For more information about transcription event contents and format, see Reporting events.

Configuring Voice Gateway to publish transcription events

  1. Set up a Splunk HTTP Event Collector (HEC) or a REST server that can store the events, for example in a noSQL database.

  2. Configure your REST server URL and authorization credentials and enable transcription events in Voice Gateway.

    • Single-tenant environment: Set the following environment variables in your configuration.
    REPORTING_URL=http://myresteventserver.ibm.com/
    REPORTING_USERNAME=myRestAdmin
    REPORTING_PASSWORD=myRestTokenOrPassword
    REPORTING_TRANSCRIPTION_EVENT_INDEX=transcription
    REPORTING_TRANSCRIPTION_EVENT_SOURCE_TYPE=vgwSessionID
    

    • Multi-tenant JSON configuration: In the multi-tenant JSON configuration file, for each tenant where you want to enable transcription reporting, configure a reporting object that contains the following properties. You can configure indexes for other types of reporting events within the same object.
    ...
    "reporting": {
      "url": "http://myresteventserver.ibm.com/",
      "username": "myRestAdmin",
      "password": "myRestTokenOrPassword",
      "transcriptionEventInd": "transcription",
      "transcriptionEventSourceType" : "vgwSessionID"
    }
    ...
    

    The transcription event index value is included as the index value in all transcription events so that the REST server that consumes them can differentiate the type of event. The index field is required for Splunk HEC, but it's also useful for anyone who is building their own REST server to handle these events.

After you redeploy Voice Gateway, it will publish event records to the configured REST server every time an utterance is detected.

Configuring Voice Gateway to publish transcription events in an IBM Cloudant database

Before configuring Voice Gateway to publish to IBM Cloudant, set up a IBM® Cloudant® for IBM Cloud. When you generate your service credentials from the IBM Cloudant dashboard, you can choose either IAM or a combination of API key and username and password.

Set the following environment variables in your configuration to publish transcription events in an IBM Cloudant database. You can use both IBM Cloudant and another reporting service simultaneously by adding the IBM Cloudant configuration variables to your configuration.

The following examples show the configuration for publishing turn events to IBM Cloudant by using a password and username authentication. See Using an API key for authentication. You can configure authentication for IBM Cloudant either with an API key by adding the REPORTING_TRANSCRIPTION_CLOUDANT_APIKEY variable or by username and password by adding REPORTING_TRANSCRIPTION_CLOUDANT_USERNAME and REPORTING_TRANSCRIPTION_CLOUDANT_PASSWORD.

  • Single-tenant environment: Set the following environment variables in your configuration.

    REPORTING_TRANSCRIPTION_CLOUDANT_USERNAME=myCloudantUsername
    REPORTING_TRANSCRIPTION_CLOUDANT_PASSWORD=myCloudantPassword
    REPORTING_TRANSCRIPTION_CLOUDANT_DB_NAME=myCloudantDB
    REPORTING_TRANSCRIPTION_CLOUDANT_EVENT_INDEX=transcription
    

    If you use non-default account configurations to IBM Cloudant, you might need to include additional environment variables. See REPORTING_TRANSCRIPTION_CLOUDANT_URL and REPORTING_TRANSCRIPTION_CLOUDANT_ACCOUNT in Reporting event configuration.

  • Multi-tenant JSON configuration: In the multi-tenant JSON configuration file, for each tenant where you want to enable transcription events, configure a reporting object that contains the following properties. If you use non-default account configurations to IBM Cloudant, you might need to include additional environment variables. See transcriptionCloudant in Properties for the reporting object.

    ...
    "reporting": {
    "transcriptionCloudant": {
      "username": "myCloudantUsername",
      "password": "myCloudantPassword",
      "dbName": "myCloudantDB",
      "eventInd": "transcription"
    }
    }
    ...
    

Using an API key for authentication

As an alternative to using a user name and password combination for your IBM Cloudant credentials, you can use an API key for authentication. You can find or generate your service credentials to IBM Cloudant, then copy the URL and API key values into your Voice Gateway reporting event configuration. See IBM Cloudant: API keys for more information about API keys.

* **Single-tenant environment:** Set the following environment variables in your [configuration](https://www.ibm.com/docs/en/voice-gateway?topic=reference-configuration-environment-variables).
```yaml

REPORTING_TRANSCRIPTION_CLOUDANT_URL=http://mytranscriptioncloudant.ibm.com/ REPORTING_TRANSCRIPTION_CLOUDANT_APIKEY=a1b2c3d3fgh1jk1mn0 REPORTING_TRANSCRIPTION_CLOUDANT_DB_NAME=myCloudantDB REPORTING_TRANSCRIPTION_CLOUDANT_EVENT_INDEX=transcription

  {:codeblock}

  * **Multi-tenant JSON configuration**: You must configure the `url` and `apikey` configuration variables instead of `username` and `password`.

  ```json
...
"reporting": {
  "transcriptionCloudant": {
    "url": "http://mytranscriptioncloudant.ibm.com/",
    "apikey": "a1b2c3d3fgh1jk1mn0",
    "dbName": "myCloudantDB",
    "eventInd": "transcription"
  }
}
...
{:codeblock}

Configuring Voice Gateway to publish transcription events in a CouchDB

To set up CouchDB, clone the repository and follow the instructions on GitHub. If you use CouchDB, you can't use an API key to authenticate.

Set the following environment variables in your configuration to publish transcription events in an CouchDB database. You can use both CouchDB and another reporting service simultaneously by adding the CouchDB configuration variables to your configuration.

The following examples show the configuration for publishing turn events to CouchDB by using a password and username authentication.

  • Single-tenant environment: Set the following environment variables in your configuration.
REPORTING_TRANSCRIPTION_CLOUDANT_URL="http://<svc-couchdb>:5984/"
REPORTING_TRANSCRIPTION_CLOUDANT_USERNAME=myCouchDBUsername
REPORTING_TRANSCRIPTION_CLOUDANT_PASSWORD=myCouchDBPassword
REPORTING_TRANSCRIPTION_CLOUDANT_DB_NAME=myCouchDB
REPORTING_TRANSCRIPTION_CLOUDANT_EVENT_INDEX=transcription

  • Multi-tenant JSON configuration: In the multi-tenant JSON configuration file, for each tenant where you want to enable transcription events, configure a reporting object that contains the following properties. If you use non-default account configurations to IBM Cloudant, you might need to include additional environment variables. See transcriptionCloudant in Properties for the reporting object.
...
"reporting": {
  "transcriptionCloudant": {
    "url": "http://<svc-couchdb>:5984/",
    "username": "myCouchDBUsername",
    "password": "myCouchDBPassword",
    "dbName": "myCouchDB",
    "eventInd": "transcription"
  }
}
...

Suppressing transcription text in reporting events

You can programmatically enable or suppress transcription text in reporting events by using the action tags, vgwActEnableTranscriptionReport and vgwActDisableTranscriptionReport. With these action tags, you can configure Watson Assistant to temporarily enable or suppress text inclusion in reporting events to prevent collection of any personal or sensitive information from callers.

In the following example, the action sequence first disables the receiver, the callee, then disables both the caller and callee. Finally, only the callee is enabled.

"vgwActionSequence": [
    {
      "command": "vgwActDisableTranscriptionReport",
      "parameters": {
        "targets": [
          "callee"   // disable just the callee
        ]
      }
    },
    {
      "command": "vgwActDisableTranscriptionReport",
      "parameters": {
        "targets": []  // disable both callee and caller
      }
    },
    {
      "command": "vgwActEnableTranscriptionReport",
      "parameters": {
        "targets": [
          "callee"    // enable just the callee
        ]
      }
    }]

Transcription event format

All Voice Gateway reporting events are based on the Splunk HTTP Event Controller JSON format.

Important: Transcription events include text transcriptions and other information that could potentially contain Protected Health Information (PHI), personally identifiable information (PII), or PCI Data Security Standard (PCI DSS) data. Therefore, it's critical that the consuming server properly stores these events to prevent exposure of personal information.

In the following example, the transcription event JSON object shows that the utterance was sent to Watson Assistant in a SIPREC session.

{
  "time": 1558628246.008,
  "host": "9.42.89.143",
  "source": "sip:18883141589@114.589.797.1",
  "sourcetype": "sipURI",
  "index": "transcription",
  "event": {
    "transcription": "Good morning",
    "globalSessionID": "0dzMpLgyJ5",
    "sipCallID": "0dzMpLgyJ5",
    "sipRecCallID": "1232673994_41860201@115.110.121.1",
    "sipFromURI": "sip:18883141589@114.589.797.1",
    "sipToURI": "sip:18882718281@182.845.045.2",
    "conversationID": "a23de67h-d2da-4fc7-8a04-a868d760671e",
    "customSIPInviteHeaders": {
      "Custom-Header1": "123",
      "Custom-Header2": "456"
    },
    "destination": "a23de67h-d2da-4fc7-8a04-a868d760671e",
    "destinationType": "conversationID",
    "sttConfig": {
      "confidenceScoreThreshold": 0.7,
      "credentials": {
        "password": "***",
        "username": "3b36c01c-6dfb-4cb6-9a7f-ea4fafd1b2f1"
      },
      "config": {
        "smart_formatting": true,
        "model": "en-US_NarrowbandModel",
        "profanity_filter": true
      }
    },
    "confidence": 0.94,
    "dtmf": false,
    "sttResponse": {
      "result_index": 0,
      "results": [
        {
          "final": true,
          "alternatives": [
            {
              "transcript": "Good morning",
              "confidence": 0.94
            }
          ]
        }
      ]
    },
    "speechRecognitionLatency": 800,
    "disabled": "false"
  }
}

In the following example, the transcription event JSON object shows that the audio URL was played to the caller in a self-service session.

{
  "time": 1558626870.906,
  "host": "192.168.0.3",
  "source": "e165e846-a2f6-493a-94e5-544647af81a8",
  "sourcetype": "conversationID",
  "index": "trx",
  "event": {
    "audioURL": "https://www.example.com/acm/8k16bitpcm.wav",
    "destination": "sip:alice@192.168.0.4",
    "destinationType": "sipURI",
    "sipCallID": "ujtu6PMqLs",
    "globalSessionID": "ujtu6PMqLs",
    "sipToURI": "sip:watson-conversation@192.168.0.3",
    "sipFromURI": "sip:alice@192.168.0.4",
    "assistantID": "bf5a9423-b0d6-437b-bcc4-01f2bc647ca4",
    "conversationLatency": 538,
    "intents": [
      {
        "intent": "One",
        "confidence": 1
      }
    ],
    "ttsConfig": {
      "credentials": {
        "password": "***",
        "username": "e48b9a83-98e2-4499-8db0-e09b2d8f2629"
      }
    },
    "disabled": false
  }
}

Event metadata

All reporting events begin with the following metadata:

Table 1. Keys for event metadata
Key Description
time The time of the event. The default format is epoch time, in the format ..
host Host name of the Voice Gateway instance that generated the event.
source Represents the Voice Gateway tenant that generated the event. Typically, this value is set to the phone number that was called.
sourcetype The source type. By default, defined as e164 for a telephone number, conversationID for a Watson Assistant workspace, sipURI for a SIP URI, or sms for SMS messages. Use the REPORTING_TRANSCRIPTION_EVENT_SOURCE_TYPE environment variable for a single-tenant deployment or transcriptionEventSourceType for a multitenant deployment to configure the source type. For more information, see Properties for the reporting object.
index The index of the event as defined on the reporting event index configuration environment variables.
event A CDR event, a Watson Assistant turn event, or a transcription event.

Transcription event object

The JSON object for each transcription event contains the following keys:

Table 2. Keys for defining transcriptions
Key Description
transcription The text from the transcribed utterance.
sipCallID The SIP Call ID, pulled from the SIP INVITE request that is related to the call.
sipRecCallID The SIPREC Call ID, pulled from the SIPREC metadata. This key has a value only when the session is a SIPREC session.
globalSessionID The value of this key depends on how Voice Gateway is configured. By default, the value is identical to the sipCallID value. If the CUSTOM_SIP_SESSION_HEADER environment variable or customSIPSessionHeader JSON property is configured, it maps to that ID. If a session is a SIPREC session, the field is identical to the gcid field in the SIPREC metadata. If the session is a SIPREC session and the CUSTOM_SIPREC_SESSION_FIELD environment variable or customSIPRECSessionField JSON property is configured, it maps to that field.
conversationID The Watson Assistant ID. This key has a value only if the utterance is sent to Watson Assistant.
customSIPInviteHeader The value of the custom SIP INVITE header. For this value to be set, the header field must be defined on the CUSTOM_SIP_INVITE_HEADER environment variable, and the INVITE request must contain the specified header. Version 1.0.0.5 and later.
customSIPInviteHeaders The value of the custom SIP INVITE headers. For this value to be set, the header fields must be defined on the CUSTOM_SIP_INVITE_HEADERS environment variable, and the INVITE request must contain the specified headers. Version 1.0.1 and later.
destination The destination where the transcription is sent to.
destinationType The destination type defined as e164 for a telephone number, conversationID for a Watson Assistant workspace, or sipURI for a SIP URI.
confidence The confidence score for the utterance from the Speech to Text service. Version 1.0.0.3 and later.
sipFromURI SIP URI from the initial SIP INVITE From field. In a SIPREC session, a caller identifier is extracted from the SIPREC metadata.. Version 1.0.0.6a and later.
sipToURI SIP URI from the initial SIP INVITE To field. In a SIPREC session, a caller identifier is extracted from the SIPREC metadata. Version 1.0.0.6a and later.
speechRecognitionLatency The amount of time in milliseconds between when silence is detected in the caller speech to when a final result from Speech to Text is received. For this value to be set, STT_LATENCY_TRACKING must be set to true. If the Media Relay fails to detect latency, this value is omitted from the transcription event. Version 1.0.0.8 and later.
disabled Indicates whether the text from the transcription event is included in the reporting event. When set to true, text is excluded from the reporting event. Set to false by default. Version 1.0.0.8 and later.
audioURL The audio URL that was played to the user. Version 1.0.2 and later.
mediaURLs A list of the media URLs that were sent or received over the SMS channel. Version 1.0.2 and later.
sttConfig The Speech to Text service configuration that was used for this transaction. Version 1.0.2 and later.
bargeinOccurred Indicates whether barge-in occurred during the play back transaction. Version 1.0.2 and later.
dtmf Indicates whether the input from the user is DTMF. Version 1.0.2 and later.
sttResponse The final response from the Speech to Text service in JSON format, including the transcript and confidence score for the top hypothesis and any alternatives. Version 1.0.2 and later.
workspaceID Watson assistant workspace ID. Version 1.0.2 and later.
assistantID Watson assistant ID. This key has a value only when using Watson Assistant v2 API. Version 1.0.2 and later.
conversationLatency The amount of time in milliseconds between when a turn was initiated to the conversation service to when a response received from the conversation service. Version 1.0.2 and later.
intents An array of intents that were recognized in the user input, sorted in descending order of confidence. Version 1.0.2 and later.
ttsConfig Text to Speech service configuration that was used for the play back transaction. Version 1.0.2 and later.
ttsLatency the amount of time in milliseconds between when the text was sent to the Text to Speech service to when the first audio chunk was received. Version 1.0.2 and later.

SMS Gateway transcription events generated by Voice Gateway

When an SMS message arrives and transcription events are enabled on Voice Gateway, Voice Gateway generates a transcription event indicating the source where the transcription came from.

In the following example, the sourcetype value, sms indicates that the transcription arrived via the SMS channel. The source value provides the telephone number that sent the SMS message. The destinationType indicates that the message is sent to Watson Assistant.

{
  "time": 1531231769,
  "host": "9.16.25.144",
  "source": "+14445556666",
  "sourcetype": "sms",
  "index": "transcription",
  "event": {
    "transcription": "Hello.",
    "sipCallID": "AbcdEF2G~",
    "globalSessionID": "AbcdEF2G~",
    "sipToURI": "sip:watson-conversation@9.95.87.495",
    "sipFromURI": "sip:14448675309@9.40.540.440",
    "destination": "6abcde93-5393-53f9-g430-907hi2j7k936",
    "destinationType": "conversationID"
  }
}

When Voice Gateway sends a message to SMS Gateway, SMS Gateway generates a transcription event indicating that an SMS message was sent.

In the following example, the destinationType value, sms indicates that a transcription is sent to the caller in an SMS message. The destination value provides the telephone number that Voice Gateway sends the SMS message to.

{
  "time": 1531231769,
  "host": "9.16.25.144",
  "source": "6abcde93-5393-53f9-g430-907hi2j7k936",
  "sourcetype": "conversation",
  "index": "transcription",
  "event": {
    "transcription": "Hello.",
    "sipCallID": "AbcdEF2G~",
    "globalSessionID": "AbcdEF2G~",
    "sipToURI": "sip:watson-conversation@9.95.87.495",
    "sipFromURI": "sip:14445556666@9.40.540.440",
    "destination": "+14445556666",
    "destinationType": "sms"  }
}