IBM Voice Surveillance Analytics

IBM Voice Surveillance Analytics solution helps identify various risk indicators and alerts associated with voice communication. The voice data files can be directly ingested into voice surveillance system in WAV format. Additionally, the voice surveillance is also capable of reading the data from voice network in PCAP format. The captured voice data is further processed using IBM Watson Speech-to-Text toolkit to generate voice transcripts in plain text format. The generated transcript is then evaluated against various features and different risk indicators are calculated. The risk indicators are then analyzed by the inference engine to detect alarming conditions and generate alerts if needed.

The following diagram shows the different data flow for IBM Voice Surveillance Analytics.

Diagram showing the data flow for voice surveillance

In IBM Voice Surveillance Analytics, the voice data can either be fed through network packets or through the voice data ingestion services.

Flow 1: PCAP format processing obtains the voice data directly from voice network packets and fetches the metadata from Bluewave APIs.

Flow 2: The Voice Ingestion Service and WAVAdaptor processing work in conjunction to process the voice data. For audio files, if the audio file or metadata format is different an adaptor must be built to invoke the voice data ingestion service.

  • The Speech to Text operators translate the voice data into transcripts with speaker diarisation.
  • The voice artifacts can be optionally exported via the voice data service export interface. It further helps to store metadata, voice transcripts, and audio file into HDFS for both of the above-mentioned flows.
  • After the Speech to Text transcript is done, a communication object is then published to the downstream analysis pipeline.
  • The generated voice transcript can further be associated with notes, annotations, tags, and evidences from the interface.