IBM Streams 4.2

Architecture

This chapter gives a high level overview about the architecture of the application framework and helps you to understand the components and their interactions.

The architectural goal of the Telecommunications Event Data Analytics application framework is to provide a customizable and scalable file processing framework with a simple but robust data consistency and recovery mechanism.

The following diagram shows the two main applications and the internal and external interfaces of the solution.

Architecture high level overview

Applications

The ITE application is responsible for processing input files, including parsing, validating, transforming, enriching, and correlating records. The Lookup Manager application is responsible for loading and updating the enrichment data in memory and distributing it across hosts. A solution can contain one or more ITE applications, and optionally one Lookup Manager application if enrichment from external data sources like Customer Relation Management (CRM) systems is required.

External interfaces

The solution has three external interfaces. Input data are the files to be processed by ITE applications. They contain the input records and can be of arbitrary formats. The toolkit comes with parsers for the most common Call Detail Record (CDR) formats, which are ASN.1, CSV and binary fixed length formats. To process input files in these formats you only need to configure the existing parsers for the concrete data layout. But you can also create your own parser and plug it into the ITE application. The results of the processing are stored in output files. Usually one output file per input file is created. Generating output files has the advantage that the results can be picked up by other components for further processing. For example you can use the DBLoader toolkit from GitHub to store the results in a database, or the HDFS toolkit to store results in BigInsights.

The Data Sources are files or databases that hold the enrichment data. The internal lookup repositories are initialized and updated from these external sources.

Internal interfaces

The Control Path is the interface between the Lookup Manager application and the ITE applications. The Lookup Manager application uses this interface to stop the file processing in the ITE applications during update of the lookup repositories and to resume file processing once the enrichment data is loaded. This is done to ensure consistent lookup data during processing of each input file. The interface is based on files that reside in a directory accessible by all ITE applications and the Lookup Manager application.

The enrichment data is stored in Shared Memory so it can be accessed by all processing elements on the same host. For deployment scenarios with multiple hosts, the Lookup Manager application takes care of creating and updating the shared memory on all hosts.

Detailed information about the applications and some of their interactions is provided in the following chapters.

Lookup Manager application
ITE application