IBM Streams 4.2.1

Operator WatsonS2T

Primitive operator image not displayed. Problem loading file: ../../image/tk$com.ibm.streams.speech2text/op$com.ibm.streams.speech2text.watson$WatsonS2T.svg

The WatsonSpeech2Text operator is designed to ingest audio data in the form of a .wav file or RAW audio and output transcriptions of speech in the form of utterances. An utterance is a group of transcribed words meant to approximate a sentence. Audio data must be in 16-bit little endian, 8 kHz sampling, mono format. The data can be provided as a .wav file or as RAW uncompressed PCM audio. Here is a sample ffmpeg command to convert a .wav file to the correct format:

$ ffmpeg -i wrongFormat.wav -ac 1 -ar 8000 correctFormat.wav

Requirements:
  • Intel RHEL6 or RHEL7 hosts.
  • The following rpms must be installed on your system: atlas and atlas-devel
  • The Watson model and model configuration files must be placed on the host you are running from
Warning:
  • Multiple copies of this operator cannot be fused into the same PE. To prevent this from happening, use partitionExclocation or partitionIsolation configuration parameters.

See samples for examples.

Summary

Ports
This operator has 1 input port and 2 output ports.
Windowing
This operator does not accept any windowing configurations.
Parameters
This operator supports 3 parameters.

Required: watsonConfigFile, watsonModelFile

Optional: resetOnIdChange

Metrics
This operator does not report any metrics.

Properties

Implementation
C++
Threading
Always - Operator always provides a single threaded execution context.

Input Ports

Ports (0)
Attributes on this input port:
  • speech (required, rstring/blob) - In the case of .wav file input, the expected value will be an absolute file location of a .wav file as an rstring. In the case of RAW audio, the expected input is of type blob.
  • id (optional, rstring) - Rstring id field for identifying audio data. Multiple audio files could be associated with the same ID. In conjunction with the resetOnIdChange parameter set to true, the Speech2Text buffer can be reset on ID change.

All extra attributes will be forwarded if matching output attributes are found.

Properties

Output Ports

Assignments
This operator allows any SPL expression of the correct type to be assigned to output attributes. Attributes not assigned in the output clause will be automatically assigned from the attributes of the input ports that have the same name and type. If there is no such input attribute, an error is reported at compile-time.
Output Functions
Speech2TextFunctions
<any T> T AsIs(T)

The default function for output attributes. By default, this function assigns the output attribute to the value of the input attribute with the same name.

rstring getUtteranceText()

Returns the transcription of audio in the form a of single utterance.

float64 getUtteranceStartTime()

Return the time of the first audio sample in the current utterance. Time starts at 0 every time the Speech2Text buffer is reset (i.e. for every incoming tuple, or on change of the id attribute).

float64 getUtteranceEndTime()

Return the time of the last audio sample in the current utterance. Time starts at 0 every time the Speech2Text buffer is reset (i.e. for every incoming tuple, or on change of the id attribute).

int32 getUtteranceNumber()

Return utterance number since the last time the Speech2Text buffer was reset/cleared. Numbering starts at 1.

DiagnosticFunctions
<any T> T AsIs(T)

The default function for output attributes. By default, this function assigns the output attribute to the value of the input attribute with the same name.

rstring getDiagnosticsMessage()

Returns the diagnostics messages. This includes error messages and speech statistics.

Ports (0)

An output tuple is created for every utterance that is observed from the incoming audio data. An utterance is a group of transcribed words meant to approximate a sentence. This means there is a one to many relationship between an incoming tuple and outgoing tuples (i.e. a single .wav file may result in 30 output utterances). There are 4 available output functions, but output attributes can also be assigned values with any SPL expression that evaluates to the proper type.

Properties

Ports (1)

Diagnostics and error port.

Properties

Parameters

Required: watsonConfigFile, watsonModelFile

Optional: resetOnIdChange

resetOnIdChange

If set to true, the Speech2Text buffer will be cleared when the incoming ID attribute changes. If set to false (default), the Speech2Text buffer will be reset on every incoming tuple. Use this parameter to maintain the Speech2Text buffer across multiple incoming .wav files or blobs of audio. In general, you want to make sure the Speech2Text buffer is reset for every "new" conversation.

Properties

watsonConfigFile

Absolute file location of the Watson language configuration file.

Properties

watsonModelFile

Absolute file location of the Watson language/acoustic model.

Properties

Libraries

Watson Speech2Text low-level API.
Library Name: rapid
Library Path: ../../impl/lib/
Include Path: ../../impl/include
Watson Speech2Text high-level API.
Library Name: watsons2t
Library Path: ../../impl/lib/
Include Path: ../../impl/include