Develop custom KPIs using the Policy Monitoring JobFramework

This article discusses the basic structure of the JobFramework and its application to the definition of a custom KPI, using the Latency KPI as an example. The Latency KPI calculates the time that is required to propagate data changes from the data sources to the operational server, which is an important characteristic of the data consistency and trustworthiness. This article also describes how to navigate the new Latency KPI reports using IBM® Cognos® Business Intelligence Server.

Somak Bhattacharya (somakbha@in.ibm.com), Software Developer, IBM India Software Labs

Somak author photoSomak Bhattacharya is a developer in IBM's InfoSphere Master Data Management (MDM) product portfolio. He has helped build the Master Data Policy Monitoring capability in MDM SE.



Srinivasa Parkala (shsriniv@in.ibm.com), Senior Software Developer, IBM India Software Labs

Author photo of Srinivasa ParkalaSrinivasa Parkala is a senior developer in IBM's InfoSphere Master Data Management (MDM) product portfolio. He has helped design and develop the Master Data Policy Monitoring capability in MDM SE.



Puneet Sharma (puneet.sharma@in.ibm.com), Senior Product Architect, IBM India Software Labs

Author photo of Puneet SharmaPuneet Sharma is a senior product architect for IBM's InfoSphere Master Data Management (MDM) product portfolio. He has designed many important features of the MDM Portfolio in the past few years. He is currently focusing on building the next generation Master Data Governance capabilities for the MDM Portfolio.



10 October 2013

Also available in Chinese

Introduction

Master data policy monitoring is a component of InfoSphere® MDM that enables organizations to report on data quality and usefulness and to establish policies for compliance with data quality thresholds. The key component of policy monitoring is the JobFramework, which is developed as stand-alone Java code. The JobFramework provides an infrastructure to run Java tasks in a multithreaded environment. Policy monitoring contains a number of metrics and reports out-of-the-box. Policy monitoring enables you to define custom Key Performance Indicators (KPIs) using the JobFramework and to run KPI computations against the InfoSphere MDM operational server. The JobFramework helps you to focus on the KPI definitions and rely on the JobFramework to handle the technicalities around multithreading and performance.

Policy monitoring includes a set of built-in KPIs. However, your organization's requirements for data governance KPIs might be different than the KPIs that are included with policy monitoring. Understanding the JobFramework enables you to implement your own KPIs. Further understanding of IBM Cognos reports enables you to design and represent the calculated KPI data in useful graphical objects like graphs or charts for easy understanding.

With the JobFramework, you can simplify the design and development of multithreaded applications. The JobFramework governs the running of a task, thread life cycles, and policies to run the threads. The JobFramework is developed leveraging Java's own thread pool implementation, ExecutorService interface. The Latency KPI identifies the lag time that is required for a data change to propagate from a source system to the InfoSphere MDM operational server. Organizations can identify the individual sources where the latency is highest. Bringing down the latency to an acceptable level for a source helps you to get more real-time information from the InfoSphere MDM operational server. This article describes how the JobFramework is used to implement the Latency KPI and how that computed results are published using IBM Cognos reports.

Prerequisites

You must have a basic understanding and some amount of hands-on experience with the following components:

  • InfoSphere MDM version 11.0, Advanced Edition and Master Data Policy Monitoring
  • IBM Cognos Business Intelligence Server 10.1.1
  • DB2® for Linux®, Unix®, and Windows® 10.1.0

Overview of the policy monitoring JobFramework

The JobFramework provides the infrastructure to run the Java tasks in a multithreaded environment. The JobFramework handles the nuances of multithreading and performance, which enables you to focus on the core KPI implementation logic. Typically, you use the JobFramework to implement custom KPIs. JobFramework overview is categorized into following sections:

  • Terminology used with the JobFramework
  • Basic components of the JobFramework
  • Implementing custom KPIs using the policy monitoring JobFramework

Terminology used with the JobFramework

Task: A task is a unit of work that starts at some point, requires some activity or computation, and then terminates.

Thread: A thread is a running instance of a task.

Thread Pool: A thread pool represents a collection of threads that are waiting for tasks to be assigned against them. Life cycle management of threads, like creation and tearing down of threads, is automatically managed by the JobFramework. The number of threads and the thread pool name are also externalized and configurable parameters.

Bounded Queue: The JobFramework logically separates the submission of a task and the running of a task. You can submit a task to be run, but the JobFramework looks for an available thread from the thread pool to run the task. A bounded queue works as a container for the newly submitted task. Unless the task is added to the queue, the JobFramework does not run the task. The size of the bounded queue is a configurable parameter.

Error Handler: The failure to run the tasks by the JobFramework has been handled and logged appropriately.

Basic components of the JobFramework

The components of the JobFramework are broadly categorized into the following sections:

  • JobFramework interface
  • JobFramework configuration module

JobFramework interfaces

The JobFramework provides a set of interfaces that you can use to implement the task.

ITask interface

The ITask interface enables you to run the code in a separate thread. You have to implement the interface and provide the definition for the executeTask method. The method signature is as follows:

public void executeTask(TaskContext context);

IChainTask interface

The IChainTask interface enables you to run the task in sequence. Each task in the sequence runs in a separate thread. For example, Task A calls Task B, Task B calls Task C, and so on. The IChainTask interface extends the ITask interface. You must implement the executeTask method and the createAndScheduleNextTasks method when you use this interface. The method signatures are as follows:

public void executeTask(TaskContext context);

public void createAndScheduleNextTasks(TaskContext taskContext);

IMultiOccuranceChainTask interface

The IMultiOccuranceChainTask interface enables you to run the tasks where a few of the task are concurrent in nature and rest of the tasks are sequential in nature. The IMultiOccuranceChainTask interface extends the IChainTask interface. You must implement the executeTask method, the createAndScheduleNextTasks method, and the isExecutionOverForAllInstances method when you use this interface. The method signatures are as follows:

public void executeTask(TaskContext context);

public void createAndScheduleNextTasks(TaskContext taskContext);

public boolean isExecutionOverForAllInstances(TaskContext taskContext);

For example, there are 10 integer arrays and each of the arrays contains 5 integers. You want to get the sum of integers for each of the arrays and then multiply those values. For example, Task A obtains the sum from each integer array and Task B multiplies the aggregated values.

To implement the previous scenario, the JobFramework runs 10 instances of Task A in parallel. On completion of those instances, a single instance of Task B runs by the JobFramework to obtain the final result. The IMultiOccuranceChainTask interface is suitable for the previously-mentioned scenario. It also implements the isExecutionOverForAllInstances method to verify if all of the instances of Task A are run before calling Task B.

JobFramework configuration module

The JobFramework configuration module contains the JobframeworkConfig.xml file, which is used to control various parameters that are related to running the tasks. With this XML file you can create multiple thread pools, specify the number of threads in each of the thread pools, and the size of the bounded queue.

ThreadPools is used to group the thread together into one pool.

MaxThreads indicates the maximum number of threads that are allowed in the thread pool.

MaxQueueSize indicates the maximum number of tasks that are submitted to the thread pool executor. If the number of submitted tasks exceeds the MaxQueueSize, the tasks are blocked until threads are available for the run.

Implementing custom KPIs using the policy monitoring JobFramework

Let's look at the following sample scenario where you want to compute a new KPI. Task A reads a few records from the database. Every single record read by Task A is processed by an instance of Task B for KPI computation. Hence, multiple instances of Task B can run in parallel.

Implementing JobFramework interfaces

As Task A runs in a separate thread and invokes Task B, Task A implements the IChainTask interface. The executeTask method of the IChainTask interface implements the code for connecting to the database and reading all of the records. The createAndScheduleNextTasks method of the IChainTask interface calls an instance of Task B for every record that is read from the database.

Every instance of Task B runs in a new thread and implements the ITask interface. The KPI implementation logic exists inside the executeTask method of Task B. Assume Task A and Task B have been implemented as Java classes, namely Task_A.class and Task_B.class.

You do not create the instance of any of the tasks, the JobFramework provides the factory to create and run the tasks as shown in Listing 1:

Listing 1. Implementation of createAndScheduleNextTasks method of Task A
public void createAndScheduleNextTasks(TaskContext taskContext) {
                
//Create the factory instance of ExecutionManager, which is used to run the task
                
ExecutionManager executionManager = ExecutionManager.getInstance();
                
//Submit Task B for run
                
executionManager.createAndExecuteTask(Task_B.class.getName(),taskContext);
                
}

Updating the JobframeworkConfig.xml file

The JobframeworkConfig.xml file helps you to define parameters related to the JobFramework. The code in Listing 2 shows the changes that are required for the previously-mentioned custom KPI implementation:

Listing 2. JobframeworkConfig.xml configuration for custom KPI
<threadPool name="four">
                
  <maxThreads>10</maxThreads>
                
  <maxQueueSize>100</maxQueueSize>

  <tasks>
                
    <task className="Task_A" isStartTask="true"/>
                
    <task className="Task_B"/>
                
  </tasks>
                
  </threadPool>

Based on the definition in Listing 2, a total of 10 threads are used to run Task A and Task B. Task A uses one thread and the remaining nine threads are used by Task B. The isStartTask=”true” attribute identifies the task class with which the JobFramework initiates the run.


Overview of the Latency KPI

A member record gets updated at a source system at some point of time, such as 2013-May-01. Due to various reasons (like asynchronous update in operational server), the update is reflected in the InfoSphere MDM operational server at a future point of time, such as 2013-May-02. In these situations, the latency for the particular member records is:

Member record update happened at InfoSphere MDM operational server - Actual update happened at source system.

Date of update on the operational server – Date of update on the source system = 1 day.

The mpi_memhead table in InfoSphere MDM operational server database contains a row for every single member record loaded in the server. The “MAUDRECNO” column of every single record in the mpi_memhead table corresponds to an audit record in the mpi_audhead table. The mpi_audhead table in InfoSphere MDM operational server database contains the audit records against every interaction (like memget, memput) that has been run.

The data model description for mpi_memhead and mpi_audhead tables can be found at Virtual MDM Data Model Description.

In the cases of a bulk load (say loading 10,000 member records), the memput operation gets called 10,000 times internally.

However, based on your selections, one of the following scenarios can happen:

Scenario 1: The mpi_audhead table contains a single audit record, which is cross-linked from all 10,000 member records that are loaded into the mpi_memhead table.

Scenario 2: The mpi_audhead table contains individual audit records for all 10,000 member records that are loaded into the mpi_memhead table.

To implement the Latency KPI, you must use Scenario 2. The source system must provide the actual record modification date as an input while loading data into the operational server. The EVTCTIME and AUDCTIME columns in the mpi_audhead table are used to persist the date for the records updated at the source and the operational server respectively.

We have loaded 2 member records on 2013-Aug-23 into the mpi_memhead table in the operational server database with the details shown in Table 1:

Table 1. Member records loaded in the operational server database
Source IDSource NameSource Record NoMember IDRecord Modification Date at Source
INSInsurance218100-00000000012013-08-12
CLClinic214100-00000000022013-08-08

The corresponding mpi_memhead and mpi_audhead table entries are show in Table 2 and Table 3:

Table 2. mpi_memhead table with loaded member records
MEMRECNOMEMSEQNOCAUDRECNOMAUDRECNOMEMSTATMEMVERNOSRCRECNOMEMIDNUM
11122A0218100-0000000001
21133A0214100-0000000002
Table 3. mpi_audhead table with loaded member audit records
AUDRECNOAUDSEQNOUSRRECNOIXNRECNOAUDCTIMEEVTCTIMEEVTTYPENOEVTINITIATOREVTLOCATION
20171Aug 23, 2013 11:56:14 AM 000000Aug 12, 2013 12:00:00 AM 0000000
30171Aug 23, 2013 11:56:14 AM 000000Aug 8, 2013 12:00:00 AM 0000000

For member record 100-0000000001, the latency is the AUDCTIME (2013-08-23 11:56:14.0) - EVTCTIME (2013-08-12 12:00:00.0) = 11 days.

Like other existing KPIs in policy monitoring, the Latency KPI computation is also performed in three phases: landing, staging, and marting.

In the landing area tables, the individual member records along with their latency are persisted. The number of member records stored in the landing area is high. Hence, we need further aggregation on this data to obtain a comprehensive result.

In the staging area tables, the rounded latency value is computed using the following formula:

ROUND (LATENCY/<ROUND_UP_DIGIT>,0)* <ROUND_UP_DIGIT> where ROUND_UP_DIGIT is a user configurable parameter.

Assume we have following member records (as shown in Table 4) in the landing area table, and the value of ROUND_UP_DIGIT = 3:

Table 4. Member records in the landing table
Member Record NoLatency in Days
18
26
311
412
517

Applying this formula helps to obtain the following rounded latency values for each member record, as shown in Table 5:

Table 5. Member records with rounded latency values for ROUND_UP_DIGIT=3
Member Record NoRounded Latency in Days
16
26
39
412
515

In the staging area tables, the number of member records that have the same rounded latency values are counted, as shown in Table 6. This helps to categorize a large number of member records into a comparatively smaller number of rounded latency groups.

Table 6. Staging area table for ROUND_UP_DIGIT=3
Rounded Latency in DaysCount of Member RecordsROUND_UP_DIGIT
623
913
1213
1513

If the ROUND_UP_DIGIT value changes from 3 to 9, then the total number of rows in staging area table also changes from 4 (as shown in Table 6) to 2 (as shown in Table 8):

Table 7. Member records with rounded latency values for ROUND_UP_DIGIT=9
Rounded Latency in DaysCount of Member Records
10
20
39
49
59
Table 8. Staging area table for ROUND_UP_DIGIT=9
Rounded Latency in DaysCount of Member RecordsROUND_UP_DIGIT
029
939

In the marting area table, we persist the maximum and minimum rounded latency against each policy monitoring run. Internally, we have subdivided the Latency KPI as two sub-KPIs for the ease of reporting:

  • Maximum Rounded Latency identifies the maximum rounded latency value (in number of days) across all sources.
  • Minimum Rounded Latency identifies the minimum rounded latency value (in number of days) across all sources.

Implementing the Latency KPI using the Policy Monitoring JobFramework

To implement the Latency KPI, perform the steps in this section.

Load data into the InfoSphere MDM operational server

During the loading of member records, the source system should provide the latest modification date for each member record. The date appears in the EVTCTIME column in the mpi_memhead table. You also need to ensure that individual audit records are created in the mpi_audhead table for every member record that is loaded into the mpi_memhead table of the operational server database.

Run Latency KPI

Before running the Latency KPI, you should have an existing policy monitoring set up and access to an Eclipse-based Java editor. Running the Latency KPI consists of following steps:

  1. Get the Java project LatencyKPI from the attached LatencyKPIartifacts.zip file under the javaProject directory. Open the project in an Eclipse editor. The project contains the following three Java classes: LatencyKPITask.java, LatencyProperties.java, and ReadMemRecds.java under package com.ibm.mdm.mdpm.latency. The LatencyProperties.java class contains all of the properties such as the InfoSphere MDM operational server database properties, the InfoSphere MDM operational server connection properties, and the policy monitoring database properties.

    You should edit these properties based on your own environment. For the policy monitoring database, use your exiting policy monitoring database instead of creating a new database. The Latency KPI has some dependencies on the existing policy monitoring tables.

  2. Add the db2jcc.jar, madapi.jar, and com.ibm.mdm.mdph.job.jar files to the classpath of the Java project LatencyKPI. The JARs are available in the <policy monitoring INSTALL_PATH>\com.ibm.mdm.mdph\lib directory.

    The <policy monitoring INSTALL_PATH> refers to the location where you have installed master data policy monitoring.

  3. Build the Java project and make a Java Archive (JAR) file out of the LatencyKPI project. Assume the name of the JAR file is latencykpi.jar.

  4. Copy the latencykpi.jar file into the <policy monitoring INSTALL_PATH>\com.ibm.mdm.mdph\lib folder.

  5. Open the JobframeworkConfig.xml file, which is in the <policy monitoring INSTALL_PATH>\com.ibm.mdm.mdph\config directory, and add the lines from Listing 3 to the file inside XML element <threadPools>:

    Listing 3. JobframeworkConfig.xml modification for Latency KPI run
    <threadPool name="latencyKPI">
                            
    <!--The maxThreads configuration specifies number of threads (for a thread pool)
    which will pick the tasks from the queue for parallel processing -->
                            
    <maxThreads>10</maxThreads>
                            
    <!-- The maxQueueSize configuration specifies the number of 
    tasks which can be added to a queue which is defined for a 
    thread pool. A task is first added to the queue for 
    processing, if the queue is full then the system waits
    till the queue become available. -->
                            
    <maxQueueSize>100</maxQueueSize>
                            
    <tasks>
                            
    <!-- The "task" configuration element provide details                      
    about the task which needs to be executed.-->
                            
    <!-- The "isStartTask" attribute specifies                        
    if a task is the first task in the chain of tasks. 
                            
    If "isStartTask" is specified as "true" then                        
    the JobFramework itself will create the task instance                         
    and will schedule it for execution. 
                            
    "isStartTask" attribute can have values as "true" or "false". -->
                            
    <task className="com.ibm.mdm.mdpm.latency.ReadMemRecds" isStartTask="true" />
                            
    <task className="com.ibm.mdm.mdpm.latency.LatencyKPITask" />
                            
    </tasks>
                            
    </threadPool>

    This ensures that the JobFramework runs the Latency KPI during policy monitoring run along with the existing KPIs. The com.ibm.mdm.mdpm.latency.ReadMemRecds.java class is the starting class for threadPool latencyKPI, which reads member records from the operational server database along with audit details. The class computes the latency for every member record and persists the computations in the landing area tables.

    Thereafter, the com.ibm.mdm.mdpm.latency.LatencyKPITask.java class aggregates and persists data in the staging and marting area tables.

  6. Open the execute_mdph.xml file, which is in the <policy monitoring INSTALL_PATH>\com.ibm.mdm.mdph\scripts directory, and add <include name="latencykpi.jar"/> under the target name "executeMdph" as shown in Listing 4. This ensures that the required Latency KPI classes are available at policy monitoring run time.

    Listing 4. Inclusion of latencykpi.jar in execute_mdph.xml file
    <target name="executeMdph" description="execute mdph" 
      depends="loadMdphProperties,print,replacelogpathWindows,replacelogpathUnix">
                            
    <java classname="com.ibm.mdm.mdpm.job.execute.JobExecutionUtil" 
      fork="true" spawn="false" failonerror="true" >
                            
      <classpath>
      <pathelement location="${log4j.jarpath}" />
      <pathelement path="${db.jdbcjars}"/>
      <fileset dir="${basedir}/lib">
        <include name="com.ibm.mdm.mdph.common.jar"/>
        <include name="com.ibm.mdm.mdph.deployment.jar"/>
        <include name="madapi.jar"/>
        <include name="com.ibm.ws.admin.client_8.0.0.jar" />
        <include name="com.ibm.mdm.mdph.job.jar"/>
        <include name="com.ibm.mdm.mds.roc.jar"/>
        <include name="com.ibm.mdm.mdph.db.conpool.jar"/>
        <include name="commons-logging.jar"/>
        <include name="commons-math.jar"/>
        <include name="com.ibm.mdm.mds.common.jar"/>
        <include name="xercesImpl.jar"/>
        <include name="latencykpi.jar"/>
      </fileset>
    </classpath>
    <arg value="-MPDH_ROOT"/>
    <arg value="${basedir}"/>
    <arg value="-RUNTIME_XML"/>
    <arg value="${runtime.xml}"/>
    <jvmarg value="-Xms${sheapsizemegabytes}m"/>
    <jvmarg value="-Xmx${xheapsizemegabytes}m"/>
    <sysproperty key="java.library.path" path="${basedir}/lib"/>
    </java>
    <echo message="Mdph execution completed"/>
    </target>
  7. Locate the createTableLatency.sql file, which is in the LatencyKPIartifacts.zip file, under the dbScript directory. After you replace the placeholder values, run the file against the existing policy monitoring database to create additional Latency KPI related tables.

  8. Run policy monitoring.

  9. After the successful completion of the policy monitoring run, you can view the computed data for Latency KPI through an IBM Cognos report.

  10. In the IBM Cognos environment, create a data source named dsn_latency_kpi against the policy monitoring database.

  11. In the IBM Cognos environment, import the LatencyKPIReports.zip file, which is located inside the LatencyKPIartifacts.zip file, under the cognosReport directory.

When you successfully run these steps, the Latency KPI is computed.


Understanding policy monitoring reports for the Latency KPI

You can start interacting with Latency KPI Cognos reports by opening the Latency-KPI-Report-Summary report. This summary report displays the maximum and minimum rounded latency values for the latest run across all sources, as shown in Figure 1:

Figure 1. Latency-KPI-Report-Summary Cognos Report
The Latency-KPI-Report-Summary Cognos Report displays the maximum and minimum rounded latency for latest policy monitoring run

Drill-through on the values of the "Rounded latency in Days" column to open the Latency-KPI-Report-Level-1. Latency-KPI-Report-Level-1 shows the trend for maximum and minimum rounded latency over different policy monitoring runs, as shown in Figure 2:

Figure 2. Trend of maximum and minimum rounded latency over different runs
It displays the trend of maximum and minimum rounded latency over different runs

The Latency-KPI-Report-Level-1 report also shows the distribution of the count of member records against the rounded latency values. It also shows the count of member records having same rounded latency value across all sources, as shown in Figure 3:

Figure 3. Details of rounded latency values and record count
It displays the record count for every rounded latency value nd its dictribution

You can drill through on any of the rounded latency values to see the details for the related member records in Latency-KPI-Report-Level-2, as shown in Figure 4:

Figure 4. Details of member records related to a particular rounded latency value
It displays the member record details for a particullar rouneded latency value

Merge the Latency KPI Cognos reports with the existing policy monitoring reports

This section describes how to merge Latency KPI Cognos reports with the existing policy monitoring IBM Cognos reports:

  1. Open the existing IBM Cognos framework manager project MDPMCognos.zip and import the following tables by using the "Metadata Wizard":

    • LND_LATENCY_KPI
    • STG_LATENCY_KPI
    • MRT_LATENCY_KPI

    During import, use the following option so that existing table relationships also get imported along with the tables, as shown in Figure 5:

    Figure 5. Import of tables using IBM Cognos framework manager
    It displays options to be selected during import of tables inside IBM Cognos framework manager

    You also need to update the property value of Schema with the macro expression #prompt('Schema Name:' ,'token')# for the newly created data source inside framework manager project, as shown in Figure 6:

    Figure 6. Update of value for property Schema with macro expression
    It displays the prompt expression to be used for property value of Schema
  2. Save the project and publish the updated IBM Cognos framework manager project.

  3. Update the Report-Summary page by adding an additional IBM Cognos List object at the bottom of the existing IBM Cognos List object. The List object contains the computed values for the Maximum Rounded Latency and Minimum Rounded Latency KPI. You also need to create the drill-through definitions on the Rounded latency in Days column. You can use the Latency-KPI-Report-Summary report as a reference for the same.

  4. Copy the Latency-KPI-Report-Level-1 and the Latency-KPI-Report-Level-2 reports into the content store of the existing policy monitoring Cognos report.

  5. The existing policy monitoring Cognos reports refer to the data source name as dsn_mdpm. The package name for the policy monitoring Cognos reports is also different from the package name for the Latency KPI Cognos reports. After you merge, you must manually open and fix all of the data source and package related issues for the Latency KPI Cognos reports before you run policy monitoring.


Conclusion

This article describes how you can leverage the capability of the policy monitoring JobFramework to build custom KPIs. It also shows how you can use the existing IBM Cognos environment to build new reports for custom KPIs.


Download

DescriptionNameSize
Sample artifacts for computing Latency KPIdm-1310LatencyKPIartifacts.zip26KB

Resources

Learn

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Information management on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management
ArticleID=947820
ArticleTitle=Develop custom KPIs using the Policy Monitoring JobFramework
publish-date=10102013