Applying IBM InfoSphere Information Analyzer rules for operational quality measurements

Monitoring the IBM InfoSphere Operations Console

A key challenge in the day to day management of any solution is in measuring whether the solution components are meeting IT and business expectations. Given these requirements, it becomes incumbent upon IT to build processes where performance against these objectives can be tracked. By employing such measurements, IT can then take action whenever thresholds for expected operational behavior are being exceeded. Assessing and monitoring operational quality on information integration processes requires establishing rules that are meaningful in relation to the existing operational metadata. Rather than start off with a blank slate, this article demonstrates how to use pre-built rule definitions from IBM's InfoSphere Information Analyzer to get under way in tracking the operational quality of IBM InfoSphere Information Server's data integration processing.

Share:

Before you start

Learn how to apply the pre-built packages of IBM InfoSphere Information Analyzer data rules for operational quality evaluation.

A key challenge in the day to day management of any solution is in measuring whether the solution components are meeting IT and business expectations. This is equally true of a data integration platform. Consider the following examples:

  • An organization has established a strict Service Level Agreement between business and IT that requires that the Marketing Data Mart is fully loaded before 6 AM each morning.
  • A data integration center of excellence has established a policy that in order to optimize their resource utilization, their ETL Architect will review any data integration process that takes more than one hour.
  • An organization that is growing through acquisition may want to have early warning indicators whether their data volume growth being loaded into the warehouse is exceeding the planned 20% increase each quarter.

Given these requirements, it becomes incumbent upon IT to build processes where performance against these objectives can be tracked. By employing such measurements, IT can then take action whenever thresholds for expected operational behavior are being exceeded. The ability for the organization to react and correct such violations then increases the organization's operational quality that is, confidence that the IT infrastructure is meeting organizational expectations.

IBM InfoSphere Information Server includes the InfoSphere DataStage and QualityStage Operations Console to monitor the job runs, services, system resources, and workload management queues. The display of information in the operations console is focused on the active Information Server environments - that is, what is occurring now with specific engines or jobs. This is a fit-for-purpose display for the Operations team; however, the focus on current events does not necessarily provide insight into the overall operational quality against the policies for this environment.

Assessing and monitoring operational quality on information integration processes requires establishing rules that are meaningful in relation to the existing operational metadata. Rather than start off with a blank slate, this article demonstrates how to use IBM's pre-built rule definitions to get under way in tracking the operational quality of IBM InfoSphere Information Server's data integration processing. Measurements, and the related exceptions, are then available to both the Data Integration Center of Excellence and Data Governance teams through the IBM InfoSphere Data Quality Console (DQC) to promote understanding and collaboration.

Objectives

In this tutorial, you will learn how to apply the pre-built packages of IBM InfoSphere Information Analyzer data quality rules for operational quality evaluation. This article shows you how to understand the available content; how to apply that content to identify operational quality conditions captured in the InfoSphere DataStage and QualityStage Operations Console data; how to identify the exceptions in Information Analyzer results, and how to incorporate the results in the Data Quality Console for exception monitoring.

Prerequisites

This tutorial is written for users of IBM Information Server who are familiar with IBM Information Analyzer, particularly use of data rules.

System requirements

To utilize the pre-built rules, you need an IBM InfoSphere Information Server platform with the following software installed: Information Analyzer, DataStage and/or QualityStage, DataStage/QualityStage Operations Console (and optionally Data Quality Console).


Overview

With IBM InfoSphere Information Analyzer, you typically create data rules to automatically monitor data for potential data quality problems against defined business requirements or based on issues identified during data profiling. However, the data rule functionality within Information Analyzer can be applied to a broader range of data evaluations such as assessment of and monitoring for operational quality. The IBM InfoSphere DataStage and QualityStage Operations Database (DSODB) is a relational database whose tables contain information about job runs and system resources used on a system that has a DataStage engine installed. The information stored in this database underlies the Information Server Operations Console, but can be queried directly in support of ongoing monitoring for operational quality.

As a working definition for operational quality we use the following: the conformance of information integration jobs or processes to established and required Service Level Agreements (SLA's) for the Availability of information, the Production of information, the minimization of Latency in information delivery, and the expected Utilization of system resources to produce the information delivered.


Operational quality categories

With a focus on operational efficiency and quality to meet Service Level Agreements for information integration and delivery, an organization can utilize the capabilities within Information Server to manage and monitor the data produced in support of such requirements. As noted above, this article generally categorizes operational quality along four primary dimensions. These operational quality categories are:

  • AVAILABILITY
    • Processes or jobs that deliver new or updated information to the Line of Business (LOB) system must execute on schedule and execute cleanly (i.e. minimal processing or job errors vs. average).
    • The focus is evaluation of the job execution status and log.
  • PRODUCTION
    • Processes or jobs that deliver new or updated information to the LOB system should produce consistent levels of data based on the data consumed.
    • The focus is evaluation of the data volume produced by the job.
  • LATENCY
    • Processes or jobs that deliver new or updated information to the LOB system must execute within the target time window.
    • The focus is evaluation of the timing of the job execution.
  • UTILIZATION
    • Processes or jobs that deliver new or updated information to the LOB system must execute within prescribed resource utilization parameters.
    • The focus is evaluation of the system resources used by a given job.

These are not the only categories or dimensions possible, but represent common ways in which information delivery is evaluated. As a starting set of pre-built operational quality rules, the contents of this article address evaluations within these categories as shown in Figure 1 below.

Figure 1. Rule evaluations within Operational Quality categories
Example rule evaluations within the 4 defined Operational Quality categories

The pre-built Information Analyzer rules for operational quality provided with this article are intended to:

  • Be run as-is with minimal configuration against the DataStage and QualityStage Operations Console Database (DSODB)
  • Reduce the effort in identifying operational quality issues within the categories noted above
  • Serve as models, templates, and examples for your own additional operational rule design
  • Be utilized in Information Analyzer jobs (for Information Server versions 9.1 or 9.1.2).

Information Analyzer rule definitions for operational quality

Information Analyzer provides a capability to define the rule logic for a broad range of data conditions. Such conditions range from simple tests of the existence of data or the equality of data to a given value to tests against reference data or of aggregated data values to tests of compound conditions including IF...THEN and AND/OR logic.

Information Analyzer provides a capability to define the rule logic for such data domains and conditions separately from any physical data source so the same logic is consistently applied from data source to data source (i.e. one data rule definition can be applied and used with many data sources) or with different global variables applied. Coupled with a capability to import a set of rule definitions in a defined XML format, you can take the pre-built rule definitions included with this article and load them into Information Analyzer and start applying them to your own operational data.


Rule definitions

Rule definitions follow a basic syntax where a variable, which could simply be a word or term, is evaluated based on a specified condition or type of check. The specified condition or check may or may not require some additional reference value, such as another variable, a list of values, a specified format, etc.. Further, several conditions may be connected together with IF, THEN, AND, or OR clauses. For example, a very simple rule definition could be as follows: JobName exists. This condition indicates that a variable called JobName must exist (i.e. cannot be a Null value). In a slightly more complex case, you could have a rule definition such as Listing 1.

Listing 1. Sample rule definition from the Operational Quality set
runmajorstatus = 'FIN' 

AND runminorstatus in_reference_list {'FOK'} 

AND runminorstatus NOT in_reference_list {'FWF','STP','SYN'}

Here, the variable RunMajorStatus must equal FIN [the code for a Finished status], the variable RunMinorStatus must be in the reference list 'FOK' [the code for Finished without warnings or errors], and that same variable must not be in the reference list of FWF, STP, or SYN [codes for fatal errors or other job aborts]. Summarizing, the rule definition tests jobs in a given DataStage or QualityStage environment to ensure that those jobs that finished completed without warnings (one of the tests for Availability noted above).

More information about creating and using rule definitions is available in the Information Analyzer user guide (see Data Rule definitions).

More information about importing and working with pre-built Information Analyzer rules can be found in the developerWorks article: Using pre-built rule definitions with IBM InfoSphere Information Analyzer.

From a design perspective, you can use the pre-built rule definitions as-is, copy/modify to meet your needs, or use them as design models. The subsequent section on Making the rule definitions to the operational quality data actionable describes the basic steps to import and utilize the pre-built rule package with the InfoSphere DataStage and QualityStage Operations Database (DSODB). The pre-built rule package files may be imported directly into Information Analyzer versions 9.1 or 9.1.2 to run against the Operations Console database of the respective version.

The pre-built rule package download [IARuleDefs-OpsSetPackage-v91.zip] includes the following files:

  • IARuleDefs-OpsSet1-baseWBindings-v91.xml
    • Includes 20 definitions across the operational quality categories described above.
  • IARuleDefs-OpsSet2-globalVariable-v91.xml
    • Includes a global variable definition for a generic project ID for use with the two subsequent files. If you wish to use the next two sets of rules, it is recommended to import this file first.
  • IARuleDefs-OpsSet2-projectIDWBindings-v91.xml
    • Includes the same 20 definitions but with an additional variable filter for a project ID bound to a global variable.
  • IARuleDefs-OpsSet3-timebasedProjectWBindings-v91.xml
    • Includes the same 20 definitions but with both the project ID variable bound to a global variable and a time-based variable included in the definition.

Any or all of the sets may be imported into Information Analyzer as the sets contain distinct rule definition names. The definitions include pre-built executable rules bound to the DSODB schema so that you may run directly out-of-the-box after import if you are using the default DSODB schema name (or that can be otherwise configured for your schema name).

For each of the operational quality categories, the sets provide three variants of the rules: a general rule that applies to all available data of the specific condition in the base set (as seen in Listing 1 above), and a more specific rule that applies to the data of the specific condition within a given project, and a specific rule that applies to the data within a given project and time range (as in Listing 2 below: it is configured as a one week interval from the current system date, but may be modified as desired for your specific needs).

Listing 2. Rule definition with a one week time range from the current date
IF project = projectid 

AND runstarttime >= (date() - 7)

THEN runmajorstatus = 'FIN' 

AND runminorstatus in_reference_list {'FOK'} 

AND runminorstatus NOT in_reference_list {'FWF','STP','SYN'}

If you compare Listings 1 and 2, the additional first two lines add limits for the set of data evaluated to those jobs where the project equals a specific project ID (i.e. a DataStage project name initially defaulted in the global variable to 'dstage1') and the starting time is greater than or equal to the system date minus 7 days (i.e. the last week).


Understanding the pre-built rule definition packages

The Information Analyzer pre-built rule definitions that accompany this article are imported through the Information Analyzer application programming interface (API) or command line interface (CLI). The pre-built definitions are structured using a defined XML schema. For full details on the structure, refer to Schema File elements for Rule Definitions.

The content includes:

  • A general XML header: <?xml version="1.0" encoding="UTF-8"?>, which should not be changed.
  • A specific XML header for Information Analyzer: <?iaapi:Project xmlns:iaapi="http://www.ibm.com/investigate/api/iaapi" name=your-project>, which should not be changed.
    • Note: You will need to modify your-project to whatever Information Analyzer project name you use.
  • XML comments enclosed between <!-- and -->
  • The start of the rule definitions section: <DataRuleDefinitions>
  • The block of content for each rule definition, including:
    • The rule-definition name (which references the relevant operational quality category)
    • A folder name (organized by the operational quality category)
    • The rule-definition description
    • The expression (the rule logic)
  • Listing 3. Example of rule definition code block

    Click to see code listing

    Listing 3. Example of rule definition code block

    <DataRuleDefinition name="SLA_Production_JobsExecutedWithoutRejects" folder="Operations SLA / Production">
    
    <description>Service Level Agreement - Production. Jobs executed without producing rejects</description>
    
    <expression>IF project = projectid AND runminorstatus in_reference_list {'FOK','FWW'} 
    
    AND lcase(linkname) contains 'reject' THEN totalrejectrows = 0</expression>
    
    </DataRuleDefinition>
    • Note that the variables in the expression are designed to match column names in the DSODB to facilitate identification and binding to actual data. Some variables expect user-specified values for comparison such as: projectid or variables beginning with min..., max..., or target.... These user-specified values can be entered as Literals or created and saved as Global Variables in Information Analyzer.
  • The start of the executable rules section: <ExecutableRules> with the block of content for each executable rule, including:
    • The executable rule name (which references the associated rule definition, though it can be distinct)
    • The executable rule description
    • Optionally the bound expression for the rule (the executable format)
    • The output definitions for the rule (the variables, columns from the DSODB tables, and expressions such as averages)
    • The bindings for the rule (which connect the rule definition variables to actual columns from the DSODB tables)
    • The join conditions for the rule (which link together data from multiple tables in the DSODB tables)
    • A target validity benchmark for the rule (set generically as examples in each rule)
  • Listing 4. Example of executable rule code block
    <ExecutableRule folder="All" 
    
    name="SLA_Availability_WarnMessagesWithinAcceptedRange_AllJobs" 
    
    excludeDuplicates="false">
    
      <description>Service Level Agreement - Availability.  Count of Warning Messages 
    
      within Target Value for all jobs</description>
    
      <BoundExpression>IF DSODB."JOBRUN".RUNMAJORSTATUS = 'FIN' 
    
      AND DSODB."JOBRUN".RUNMINORSTATUS in_reference_list {'FWW'} 
    
      THEN DSODB."JOBRUN".NUMMESSAGESWARNING <= targetmessagecount
    
        </BoundExpression>
    
        <OutputDefinition type="FailingRecords">
    
          <OutputColumn name="JOBNAME" type="column" value="DSOpsData.DSODB.JOBEXEC.JOBNAME"/>
    
          <OutputColumn name="RUNMAJORSTATUS" type="variable" value="runmajorstatus"/>
    
          <OutputColumn name="RUNMINORSTATUS" type="variable" value="runminorstatus"/>
    
          <OutputColumn name="NUMMESSAGESFATAL" type="column" 
    
            value="DSOpsData.DSODB.JOBRUN.NUMMESSAGESFATAL"/>
    
          <OutputColumn name="NUMMESSAGESWARNING" type="variable" 
    
            value="warnmessages"/>
    
          <OutputColumn name="NUMMESSAGESTOTAL" type="column" 
    
            value="DSOpsData.DSODB.JOBRUN.NUMMESSAGESTOTAL"/>
    
          <OutputColumn name="TARGETMESSAGECOUNT" type="variable" value="targetmessagecount"/>
    
        </OutputDefinition>
    
        <Bindings>
    
          <Binding var="runmajorstatus">
    
          <Column name="DSOpsData.DSODB.JOBRUN.RUNMAJORSTATUS"/>
    
        </Binding>
    
        <Binding var="runminorstatus">
    
          <Column name="DSOpsData.DSODB.JOBRUN.RUNMINORSTATUS"/>
    
        </Binding>
    
        <Binding var="warnmessages">
    
          <Column name="DSOpsData.DSODB.JOBRUN.NUMMESSAGESWARNING"/>
    
        </Binding>
    
        <Binding var="targetmessagecount">
    
          <Constant numericValue="10"/>
    
        </Binding>
    
        </Bindings>
    
        <JoinConditions>
    
          <JoinCondition joinType="inner" leftKey="DSOpsData.DSODB.JOBEXEC.JOBID" 
    
          rightKey="DSOpsData.DSODB.JOBRUN.JOBID"/>
    
        </JoinConditions>
    
        <Benchmarks>
    
          <ValidityBenchmark metric="percent_not_met" operator="less_or_equal" value="0.10"/>
    
        </Benchmarks>
    
    </ExecutableRule>
  • After all of the rule-definition blocks, the XML is completed with the end of the rule-definitions section and the end of the XML content.

Making the rule definitions for operational quality actionable

In order to use the pre-built operational quality rules with the Operations Console data and track exceptions from these rules, there are four steps you will need to take.

  1. Understand the structure and content of the DataStage and QualityStage Operations Console Database (DSODB) - there are several tables that are most relevant and used by the rules.
  2. Add the DSODB as a data source in Information Analyzer.
  3. Import the Base Set as initial examples of operational quality rules.
  4. Run the data rules in Information Analyzer. The data rules can be tracked, monitored, or reported on in Information Analyzer.

Optionally, you can also:

  • View and monitor the data rule results via the InfoSphere Data Quality Console
  • Import additional rule packages or Generate additional data rules from the data rule definitions

Understanding the Operations Console database structure and content

For a general discussion of the structure of DataStage and QualityStage Operations Console Database (DSODB), see the IBM developerWorks article: Get started with the IBM InfoSphere DataStage and QualityStage Operations Console Database, Part 1: An introduction.

NOTE: Columns available for use may vary in different versions of Information Server from the time of this article. To ascertain current contents, refer to the list of columns when you import the metadata for the DSODB.

The majority of the pre-built operational quality rules take advantage of data in six of the DSODB tables: JOBEXEC and JOBRUN primarily; JOBLINK and JOBRUNLINK for rules evaluating data rejects; and JOBSTAGE and JOBRUNSTAGE for rules evaluating stage performance. The columns available in these six tables are shown in Figure 2 below.

Figure 2. Column details for the six DSODB tables used in the operational quality rules
View of the 6 DSODB tables and underlying columns.

The job execution table, JOBEXEC, stores instances of executions for each DataStage, QualityStage, or Information Analyzer job that is run. It includes two columns that are of particular use: JOBNAME and PROJECTNAME.

The JOBNAME column is included in all sets of executable data rules in the output table for reference. In sets 2 and 3, the pre-built rules include an initial IF condition where PROJECT = PROJECTID. This allows data rules to be generated from a given data rule definition relevant to a specific DataStage or QualityStage project and filter out other projects so that you can set targets and requirements at that level. The PROJECTNAME column is bound to the PROJECT variable, while you can use a literal or global variable to set the PROJECTID variable to the specific project of interest.

NOTE: If you wish to filter down to a specific job, you will need to add an additional condition to the data rule definition such as: JOBNAME = JOBNAMEID where the JOBNAME variable is bound to the JOBNAME column and the JOBNAMEID variable is bound to a literal or global variable with your desired value.

The job run detail table, JOBRUN, stores a broad range of data for the specific job execution such as:

  • Job status: RUNMAJORSSTATUS, RUNMINORSTATUS
  • Timestamps: RUNSTARTTIMESTAMP, RUNENDTIMESTAMP
  • Messages statistics: NUMMESSAGESTOTAL, NUMMESSAGESFATAL, NUMMESSAGESWARNING
  • Processing statistics: TOTALROWSCONSUMED, TOTALROWSPRODUCED
  • Resource statistics: TOTALCPU, ELAPSEDRUNSECS

Different pre-built rules take advantage of these columns to assess whether the job ran successfully, with warnings, or failed; when the job started and stopped; how much CPU was used, etc.

To get information about rejected rows of data, you need to connect to two additional tables: JOBLINK and JOBRUNLINK. An example of a generated data rule with this information is included below in "Example: capturing number of rows of rejected data". To get information about CPU usage by stages within jobs, you need to connect to two additional tables: JOBSTAGE and JOBRUNSTAGE.

Note: The link and stage tables are only populated if the DSODBConfig.cfg file has the property MonitorLinks=1 (by default, this is set to 0 which means that no link or stage information is collected). Discuss setting this property with your Information Server administrator.

One approach to get further understanding of the content of the DSODB is to run Information Analyzer column analysis on the tables so that you can see the variety of information available in each column and table.


Adding the Operations Console database as an Information Analyzer source

Configuring the DataStage and QualityStage Operation Console Database (DSODB) as a data source for Information Analyzer follows the same steps as any other data source. See the Information Analyzer User Guide: Configuring your System Resources

In brief:

  1. Establish the Host (commonly the name of the server with the data) and Data Source names on your system. The operational quality rule packages are preset with the Data Source name DSOpsData. If you are using a different Data Source name, you can open the XML file for the rule package in a text editor such as Notepad (not Microsoft Word) and do a global Replace on the value DSOpsData.
  2. Provide DSODB as the ODBC DSN for the data connection (likely this ODBC DSN was created when installing and configuring the DSODB.
  3. Import the metadata for DSODB as shown in Figure 3 below.
  4. Create an Information Analyzer project and add DSODB as a data source. If you are using a different Schema name than DSODB you can open the XML file for the rule package in a text editor such as Notepad (not Microsoft Word) and do a global Replace on the value "DSODB".
Figure 3. Import of metadata for DSODB
Import of metadata for DSODB

Importing the rule definitions for operational quality

The operational quality rule definitions can be imported either with the Information Analyzer API or CLI. This article describes import via the CLI. For more examples of how to import pre-built rules in Information Analyzer, including use of the API, see: Importing the pre-built rule definitions

The subsequent import steps assume that you:

  1. Download, unzip (extract), and save the Rule Definition XML files to wherever the Information Analyzer Client or server reside (this can be to a remote server, a remote Client Image, or your own machine)
    • Note: these instructions assume you have downloaded on your own machine and will be imported from your local environment
  2. Open each XML file you wish to use with Notepad (or any other basic file editor, but not Microsoft Word)
  3. Change the project name (listed as your-project as noted above) to one which already exists in your Information Analyzer environment that you have access to.
    • Note: If there is not yet an Information Analyzer project your Information Analyzer Project Administrator will need to create one
    • Note: If you do not change the project name in the XML file, a project named your-project will be created and all the rule definitions will go there.
  4. Save the XML file(s).

To perform the import of the Information Analyzer rule definitions via the CLI, take the following steps:

  1. Open a command (DOS) prompt on the client
    • For example, on Windows XP you can use: Start > All Programs > Accessories > Command Prompt
  2. Navigate to: C:\IBM\InformationServer\ASBNode\bin
  3. For the base package, run the following command:

    Click to see code listing

                                IAAdmin -user xxxxx -password xxxxx -host your-server -port 9080 -create  
    
    
                                    -projectContent C:\Temp\IARuleDefs-OpsSet1-baseWBindings-v91.xml
  4. Use the Information Analyzer configuration information in the above command relevant for your environment:
    • -user [your Information Analyzer userid]
    • -password [your Information Analyzer password]
    • -host [your Information Analyzer server - should match your login information]
    • -port [usually is 9080 - should match your login information]

    After -projectContent include the location where you saved the XML file

    The example shows the file saved at C:\Temp\ but the location and name of your file may be different than the one shown here:

    Notes:

    • If you are importing the additional rule definition packages, you will need to change the file name and run the import command additional times for each set
    • o If you are importing the additional rule definition packages, it is recommended that you import the global variable package [IARuleDefs-OpsSet2-globalVariable-v91.xml] first so that the global variable is properly applied to all the data rules
    • You will receive an error if you run the above command twice with the same file name. The error will tell you that the rules already exist. If this happens, then re-issue the above command with the -update instead of -create option
    • The above command will typically run for a couple minutes

If the import has gone successfully, you can login to Information Analyzer; open your project (same as specified in the XML file in the import); and review the imported rule definitions. You should see a list of rule definitions similar to Figure 4.

Figure 4. Base set of Operational Quality rule definitions imported into Information Analyzer
Base set of Operational Quality rule definitions imported into Information Analyzer

Running operational quality data rules from the rule definitions

In order to evaluate operational quality with the data in the DSODB, you need to run executable data rules generated from the pre-built data rule definitions. The packages provided include one example generated data rule for each data rule definition. These examples include the binding, join conditions, and output definitions that make a data rule executable against actual data. These examples expect the Data Source and Schema names noted above (DSOpsData and DSODB respectively). If those are different from what is in your Information Analyzer metadata at the time of import, you will need to modify the data rules yourself in order to connect them to the actual data.

Assuming that all imports happened successfully and connected to the existing metadata, you should be able to select "Run" for each of the data rules. Execution time and results will depend on the data already in your DSODB database.

Two examples are shown below to illustrate the steps needed if you wish to generate your own executable data rules for specific projects or other variable parameters

Example: Identifying jobs which failed execution during a given time period

This example references the rule definition: SLA_Availability_ProjectJobExecutedWOFail_forPriorPeriod. The rule checks for a specific project ID and any job run in that project within the last week - for those job executions that meet that criteria, the rule checks to ensure that the RUNMAJORSTATUS indicates the job finished ('FIN') and that the RUNMINORSTATUS indicates the job finished OK or only with warnings ('FOK' or 'FWW'). Any other major or minor job status is an exception.

The logic for this rule is as follows:

IF project = projectid AND runstarttime >= (date() - 7)  

THEN runmajorstatus = 'FIN' 

AND runminorstatus in_reference_list {'FOK','FWW'}

There are five variables that will be bound by default to data: project, projectid, runstarttime, runmajorstatus, and runminorstatus. Figure 5 below shows example bindings for these variables.

Figure 5. Bindings for rule variables
Bindings for rule variables

The project variable is bound to the PROJECTNAME column in the JOBEXEC table. The projectid variable is bound to a global variable named SOI. This global variable contains the literal SOI, the name of an example DataStage project of interest. The remaining three variables are bound to respectively named columns in the JOBRUN table.

Since there are two tables included in the bindings, a Join is required to connect the data from the two tables. JOBRUN and JOBEXEC are joined on their respective JOBID columns. Figure 6 shows the example Join Keys.

Figure 6. Join keys for the data rule
Join keys for the data rule

Finally, output columns are identified for exceptions. Figure 7 shows an example set of output columns from the JOBEXEC and JOBRUN tables, including the rule variables highlighted in yellow. The output records will only be created in the case where data does not meet the rule condition (i.e. jobs that had finished with fatal errors).

Figure 7. Output columns for the data rule
Output columns for the data rule

After execution of this rule, the output shows two jobs that failed to satisfy the rule as seen in Figure 8. In both cases, the RUNMINORSTATUS value was FWF, the jobs Finished with Fatal messages. Additional columns included in the output show the number of Fatal Messages generated for each execution.

Figure 8. Results of rule execution against the DSODB
Results of rule execution against the DSODB

Example: capturing the number of rows of rejected data

A second example highlights how to incorporate evaluations for data on reject links in a DataStage job - potentially a sign of quality issues with the data or of issues with the job itself in handling data. This example references the rule definition: SLA_Production_ProjectJobsExecutedWithoutRejects.

The logic for this rule is as follows:

IF project = projectid 

AND runminorstatus in_reference_list {'FOK','FWW'} 

AND lcase(linkname) contains 'reject' 

THEN totalrejectrows = 0

Where the job is part of the identified project, has run successfully (though possibly with warnings), and has one or more job links that contain the word reject, then the total number of rejected rows on those links should equal 0. The bindings for the variables in this rule go across four distinct tables as seen in Figure 9: JOBEXEC, JOBRUN, JOBLINK, and JOBRUNLINK.

Figure 9. Bindings for variables in a data rule evaluating rejected rows of data
Bindings for variables in a data rule evaluating rejected rows of data

The JOBLINK table contains the column LINKNAME. The LINKNAME may be equal to reject or may contain the word reject. To make this column available to the rule validation, you start with the JOBEXEC table which contains the project name. The JOBEXEC table links to the actual jobs executed in the JOBRUN table through the JOBID column. You then connect to the JOBRUNLINK table via a RUNID column. With that, you can join the JOBLINK table through the LINKID column. The join connections required for this data rule are shown in Figure 10

Figure 10. Join conditions in a data rule evaluating rejected rows of data
Join conditions in a data rule evaluating rejected rows of data

The result of executing this rule, along with the output columns used, can be seen in Figure 11. In this instance, one job produced 39 rows of data on a Link Name called reject. Since the total rows consumed and total rows produced are equal, these may be extraneous records or may need deeper investigation.

Figure 11. Results of executing a data rule to identify rejected rows of data
Results of executing a data rule to identify rejected rows of data

Monitoring the operational quality in the Data Quality Console

As seen in Figures 8 and 11 above, the results of executing these rules can be reviewed directly in Information Analyzer through its user interface, or can be written to standard Information Analyzer reports. An additional option to support monitoring of operational quality results is to leverage the InfoSphere Data Quality Console. You can review specific details on the Data Quality Console in its User Guide, particularly the section: Workflow for collecting exceptions to view in the data quality console

In brief:

  1. Configure project connections in the Data Quality Console. Specifically this will add the Information Analyzer project in which the operational quality data rules exist and execute as seen in Figure 12 below (where Operational Quality is the example project name and OpsQualitySteward is the name of the Information Analyzer project administrator for that project).
    Figure 12. Configuring the Data Quality Console to check and receive exceptions
    Configuring the Data Quality Console to check and receive exceptions
  2. Assign stewards to monitor the results in the Data Quality Console.
  3. View exception summaries in the Data Quality Console as in Figure 13 below.
    Figure 13. A list of evaluated operational quality data rules in the Data Quality Console
    A list of evaluated operational quality data rules in the Data Quality Console
  4. Drill into exception details as seen in Figure 14 below. Note that the specific item shown is the same data as seen in Information Analyzer in Figure 11.
    Figure 14. The detail of a specific operational quality data rule in the Data Quality Console
    The detail of a specific operational quality data rule in the Data Quality Console

The Data Quality Console allows identification of those individuals such as Data Stewards or ETL Architects specifically tasked with monitoring operational quality to focus on those projects and data rules of interest to them.


Conclusion

You should now be able to use the pre-built set of rule definitions accompanying this article in your IBM InfoSphere Information Analyzer projects to evaluate operational quality from the data in the IBM InfoSphere DataStage and QualityStage Operations Console database.

This article has specifically looked at how to do the following tasks:

  • Understand the rule definitions available in the pre-built package accompanying this article.
  • Use the pre-built rule definitions to address operational quality conditions.
  • Apply the pre-built rule definitions to data in the DSODB.

Once imported, you can use the pre-built rule definitions to establish and run operational quality rules, and subsequently monitor for potential operational quality problems. And, by using pre-built rule packages, you can shorten the time needed to put operational quality checks in place for the DataStage, QualityStage, or Information Analyzer jobs in your IBM InfoSphere Information Server environment.


Download

DescriptionNameSize
Pre-built sample code for this articleIARuleDefs-OpsSetPackage-v91.zip---

Resources

Learn

Get products and technologies

  • Evaluate IBM products by downloading a product trial at Downloads.

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Information management on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management
ArticleID=959775
ArticleTitle=Applying IBM InfoSphere Information Analyzer rules for operational quality measurements
publish-date=01092014