Collecting entity and transaction statistics
The MDM statistics collection feature enables you to collect statistics about entities and transactions in the InfoSphere® MDM system.
The MDM statistics collection feature leverages Apache Kafka stream and connector applications to compute and process statistics data. Kafka server capabilities allow InfoSphere MDM to store statistics data with automatic persistence management and online stream processing in a time-based window. This is critical because the amount of collected statistics data can be extremely large in a production system, and that volume keeps increasing with every transaction that is run. The MDM database alone would quickly become overwhelmed and performance would deteriorate. To avoid impacting InfoSphere MDM system performance, the statistics are collected in real time, but are then stored in statistics tables for hourly aggregation.
Data flow and architecture
In simple terms, the MDM statistics collection feature works as follows:
- The statistics collection facilities collect entity and transaction statistics data at real time when users run transactions.
- The collected data is published into Kafka input topics at the end of each transaction.
- Statistics streams, a long running Kafka client application built on Kafka Streams, consumes the data and aggregates the data with a fixed hourly window.
- The computed results are recorded in Kafka output topics.
- The statistics connector, a long running Kafka client application built on Kafka Connect,
consumes the results and stores them in two statistics tables:
EntityStatisticsandTransactionStatistics.
The MDM statistics collection feature leverages the Kafka producer components and its own statistics components on the InfoSphere MDM operational server to collect data. It also contains a statistics streams application and a statistics connector application, both of which are run in Kafka environments.
