Using a device controller
Typically, you use the local speaker and microphone of the audio device to send and receive audio. However, you might have a smart speaker in your environment that you would like to use to perform external processing or you might have additional device controls, for example, for volume or display. Watson Assistant Solutions provides you with the option to use your own smart speaker and microphone with the audio client. The smart speaker acts as a device controller for the audio client.
Socket interfaces
The audio client provides socket interfaces to allow control of the audio client from a device controller and sending audio to and from the controller. There are two socket interfaces:
- Command socket interface: The device controller sends text-based commands on the command socket interface to the audio client. The audio client sends status messages and commands to the device controller. The port is set using
the
cmdSocketPort
property in theconfigure.properties
file. - Audio socket interface: Audio is sent from the device controller to the client and from the client to the controller on an audio socket interface. The port is set using the
audioSocketPort
property in theconfigure.properties
file.
For example, the device controller wants to send audio to the audio client on the audio socket interface, but the client is not ready to process the audio. The device controller sends a RAS (a trigger to the client to read from the audio socket)
command on the command socket interface. The audio client responds with a micWakeUpNotAllowed
status message. The device controller waits for a micWakeUpAllowed
status message and sends data to the audio client on the
audio socket interface.
Command socket interface
You can test the command socket interface using telnet. Complete these steps:
- Establish a telnet connection to the audio client on the command port.
- Wait for the client to respond with OK.
- Send a command and terminate the command with a carriage return.
Table 1 displays the commands from the device controller to the client.
Command | Description |
---|---|
OS |
Send output to the speaker. |
OAS |
Send output to the audio socket. |
RM |
Read the microphone (trigger). |
RAS |
Read the audio socket (trigger). |
EXIT |
Disconnect. |
Table 2 displays the responses from the audio client to the device controller.
Command | Description |
---|---|
OK |
Command was received and is acknowledged. |
? |
An unknown command was received. |
DONE |
The client was told to disconnect. |
micWakeUpNotAllowed |
The client will not respond to the wake up command trigger. |
micWakeUpAllowed |
The client will respond to the wake up command trigger. |
micOn |
The client is expecting audio. |
micOff |
The client is not expecting audio. |
Table 3 displays the status messages from the audio client to the device controller to show the status of the connection to the audio gateway.
Command | Description |
---|---|
serverConnected |
The client is connected to the server but is not yet ready to start. |
serverConnecting |
The client is attempting to connect to the server. |
serverConnectionReady |
The client is connected and is ready to start. |
serverNotConnected |
The client is not connected to the server. |
Audio socket interface
The audio socket interface sends and receives audio data streams in binary format. Currently, the format is fixed as follows:
- Input: audio/l16 (PCM, SampleRate=16,000, Channels=1, Bits=16)
- Output: audio/l16 (PCM, SampleRate=16,000, Channels=1, Bits=16)
If the controller sends an OAS command for diverting audio output to the audio socket, the controller must be ready to receive audio data and process it (that is, play it). No command is sent from the client to indicate that audio data will be sent.
When the controller sends a RAS command to trigger the reading of audio data from the audio socket, the client responds with a micOn
response and starts to read data from the audio socket. The audio data is sent to the Watson server
for transcription. Once the transcription has responded with an acceptable confidence level, the client sends a micClose
response. Any further data that is received on the audio socket is discarded.