PCAP format processing

Processing voice data from network involves speech extraction from network packets and IPC-based call metadata extraction.

Networktap job

This job sniffs the network packets from the defined network interface, and then takes the received packets and transfers them to the downstream job. The Standalone job connects to the network interface card by using the IBM Streams Network Toolkit’s PacketLiveSource operator. This operator puts the network interface into promiscuous mode to enable gathering of all network packets. The packets are then forwarded to the downstream PCAP Adaptor job by using the IBM Streams Standard Toolkit’s TcpSink operator.

Figure 1. Networktap Streams job
Diagram showing the Networktap Streams job

PCAP Adaptor job

The PCAP Adaptor job parses PCAP data from a network port. The raw packet data is also exported to the IPC job. Packets are filtered based on IP addresses, subnets, or by login names. The filtered RTP packets are processed and all of the audio packet data that is collected for a given call is exported to the RouteSpeech job. Certain call attributes, such as the callid, channel_id, source, and destination port, are exported to the CorrelateCallMetadata job

Figure 2. PCAP Adaptor job
Diagram showing the PCAP Adaptor job

IPC job

The IPC metadata extraction jobs consists of two SPLs: IPC and CorrelateCallMetadata. The IPC job receives raw socket data from the PCAP Adaptor job. It identifies the SIP Invite messages of new user logins to their turret devices. It then parses the XML data packets to fetch the device ID and session ID that corresponds to the handsets and stores it in an internal monitored list. This is done to avoid monitoring audio data from speaker ports. After the SIP ACK messages are received, it verifies that the device ID from the ACK is present in the monitored list. It then emits the device ID and the destination port.

Figure 3. IPCS Streams job
Diagram showing the IPC Streams job

CorrelateCallMetadata job

The CorrelateCallMetadata job uses Bluewave’s (BW) LogonSession API to prepare a list of all of the users who are logged on to the voice network. From the LogonSession response XML, certain attributes about the users, such as their IP address, loginname, userid, zoneid, zonename, loginname, firstname, lastname, and emailid are extracted and cached. Subsequently, for users who log in , their corresponding device IDs are sent by the IPC job. For the incoming device ID, the LogonSession details are fetched from the BW API and the user list in the cache is updated.

Any access to BW needs an authentication token. After an authentication token is created, it is refreshed at regular intervals. Also at regular intervals, a call is made to get the communication history for the last 5 seconds. This is compared with the call records that are extracted from the RTP packets based on the loginname and call start and end times. If the call timing of the communication history record and of RTP packets are within a tolerable deviation, then that communication history record is assigned as the metadata record for the call that is identified in the RTP packets. The identified metadata record is then exported to the RouteSpeech job.

Figure 4. CorrelateCallMetadata Streams job
Diagram showing CorrelateCallMetadata Streams job

RouteSpeech job

The RouteSpeech receives audio packets and metadata as tuples. In an organization’s voice network, calls emanate from different departments and each department may have vocabulary that is specific to its business. As a result, calls must be routed through specific Speech to Text language models that are based on the source of the call, for example, calls from the Foreign Exchange department might be routed through a S2T language model that was developed specifically for Foreign Exchange whereas calls from the Equity team are routed through a S2T language model that was developed for Equity. This allows a better accuracy rate in speech recognition. Based on the department of the loginname that is associated with the call, the raw speech files are created in a directory that is assigned to a route. Metadata tuples are updated with the partyid and exported to ProcessMetadata job.

Figure 5. RouteSpeech Streams job
Diagram showing RouteSpeech Streams job

PCAPSpeech job

This job contains the SpeechToText operators for processing audio data from the raw speech files that are created in the RouteSpeech job. After the S2T is complete, it checks if metadata for the concerned call is available. If metadata is available, it is correlated with the converted text from the call and a CommData object is created and published to Kafka. Also, if an export URL is configured, the voice artifacts—the metadata, utterances, and the audio binary—are sent to the export service. The default export service persists the artifacts to the HDFS file system. If the metadata is not available, the audio binary and utterances are persisted to HDFS.

Figure 6. PCAPSpeech Streams job
PCAPSpeech Streams job

ProcessMetadata job

The ProcessMetadata job consumes the metadata tuples that are sent by the RouteSpeech job. It checks if transcripts for the concerned call is available. If a transcript is available, it is correlated with metadata and a CommData object is created and published to Kafka. Also, if an export URL is configured, the voice artifacts—the metadata, utterances, and the audio binary—are sent to the export service. The default export service persists the artifacts to the HDFS file system. If a transcript for the call is not available, the metadata is persisted to HDFS.

Figure 7. ProcessMetadata Streams job
ProcessMetadata Streams job