Example: Adding business context to your operational data

You can use IBM® InfoSphere® Information Governance Dashboard to view operational results from a business perspective. You derive this business perspective from glossary objects that are defined in IBM InfoSphere Information Governance Catalog, such as categories, terms, information governance policies, and information governance rules.

Linking operational data with information governance rules and policies

A key business dimension for information governance is based on the relationships between information governance policies, information governance rules, and the asset that implements information governance rules. These relationships are of type n:m and thus need to be represented by using bridge views. The igviews.IGIMPLEMENTEDBYIARULE and igviews.IGRULEREFBYCONTAINERPOLICY views serve this purpose. They do not contribute any attributes on their own.

Because business dimensions are essential for information governance reporting, the respective models treat all elements of these dimensions as mandatory, although the underlying relationships are optional in many cases.

Model fragment of governance rules and policies

The following query returns a list of information governance policies and a list of information governance rules that are assigned to these information governance policies. The query also returns a list of data rules that implement these information governance rules and the average number of records that are processed for the data rule at issue. A SELECT DISTINCT statement and a corresponding GROUP BY clause are used as a simple way to aggregate related facts. In a real usage scenario, facts would typically be aggregated using computations, such as average or maximum. The result is sorted by information governance policy name.

SELECT distinct
   policy_dim.NAME,
   igrule_dim.NAME,
   datarule_dim.NAME,
   AVG ( fact.TOTALRECORDS ) AvgNumRecordsProcessed

FROM IAVIEWS.IARULEEXECUTIONHISTORY fact
   INNER JOIN IAVIEWS.IARULE datarule_dim
      ON datarule_dim.RID = fact.RULERID
   INNER JOIN IGVIEWS.IGIMPLEMENTEDBYIARULE bridge1
      ON bridge1.IMPLEMENTATIONRULERID = fact.RULERID
   INNER JOIN IGVIEWS.IGBUSINESSRULE igrule_dim
      ON igrule_dim.RID = bridge1.BUSINESSRULERID
   INNER JOIN IGVIEWS.IGRULEREFBYCONTAINERPOLICY bridge2
      ON bridge2.RULERID = igrule_dim.RID
   INNER JOIN IGVIEWS.IGPOLICY policy_dim
      ON policy_dim.RID = bridge2.POLICYRID
      
GROUP BY
   policy_dim.NAME, igrule_dim.NAME, datarule_dim.NAME

ORDER BY
   policy_dim.NAME

The results from running this query might look like the data in the following table.

Information governance policy InformationGovernanceRule DataRule AvgRecProc
Data Completeness Address Validation and Verification BusAddZipCodeFormat 11005
Data Completeness Name validation and verification BusNameOneExists 11005
IGD Dashboard Configuration Validity IGD Configuration Parameter Validation DataRule2_1 4
IGD Dashboard Configuration Validity IGD_Configuration_Parameter_Validation_2 DataRule2_1 4
IGD Dashboard Configuration Validity3 IGD_Configuration_Parameter_Validation_3 DataRule2_1 4

The Policy column shows duplicate values, reflecting the fact that two or more information governance rules can be assigned to the same policy. A similar query might return the average number of records that are processed by a policy. In this case, it is important to ensure that the data rules associated with the policy all operate on the same table to make sure that the result is meaningful.

DataRule2_1 is assigned to two different information governance rules. This assignment illustrates that the relationship between information governance rules and data rules is of type n:m. Similar considerations apply when aggregating facts at the information governance rule level.

Linking operational data with categories and terms

You can use a business-oriented notion of scope to help categorize operational data from the perspective of business domains. While the technical notion of scope helps to define correct grouping for fact aggregation, the business notion of scope supports the presentation layer of InfoSphere Information Governance Dashboard.

If you are an InfoSphere Information Governance Dashboard user who is interested in a high-level overview, you might open an information governance main report that summarizes key metrics for the business areas. From there, you can drill down into your areas of interest, guided by how report elements are organized. At a high level, this organization is likely to follow business categories rather than technical categories, such as database assets, data quality projects, or data rule names.

InfoSphere Information Governance Catalog provides glossary objects that you can use to define business terminology at various levels of granularity. Categories and terms represent entities that are of interest to the business. The following examples illustrate how you can use categories and terms to classify facts such as those resulting from the execution of data rules.

The following illustration shows a sample glossary with examples of tables that are governed by terms:

Sample glossary with examples off tables that are governed by terms

The following model fragment shows how the dimensional model represents the business scope dimension, based on terms and categories:

Model fragment of dimensional model

You can add the dimension of implemented data resources either by using the data rule binding or by using the governed asset relationship at the information governance rule level. See Example: Determining what data the operational data is derived from for a detailed description of these two options. Corresponding to these two options, there are two typical ways to anchor this fragment on the reporting model. The anchor point is always the RID of an implemented data resource, usually a column, a table, or both, as illustrated by the following model fragment:

Model fragment that represents RID as anchor point

The following query returns the maximum number of records that are processed for the data rules that have associated terms and categories, where the data rule-term relationship is based on the data rule binding.

SELECT
   category.NAME Category,
   term.NAME Term,
   datarule_dim.NAME DataRule,
   MAX ( fact.TOTALRECORDS ) MaxNumRecordsProcessed
   
FROM IAVIEWS.IARULEEXECUTIONHISTORY fact
   INNER JOIN IAVIEWS.IARULE datarule_dim
      ON datarule_dim.RID = fact.RULERID
   INNER JOIN IAVIEWS.IARULEBINDING binding
      ON binding.RULERID = fact.RULERID
   INNER JOIN IAVIEWS.IAREGISTEREDCOLUMN bridge1
      ON bridge1.RID = binding.TARGETRID
   INNER JOIN CMVIEWS.PDRDATABASECOLUMN dbcol
      ON dbcol.RID = bridge1.DATAFIELDRID
   INNER JOIN CMVIEWS.PDRDATABASETABLE dbtab
      ON dbtab.RID = dbcol.OFDATABASETABLERID
   INNER JOIN IGVIEWS.IGASSIGNEDOBJECTSOFATERM classification
      ON classification.CLASSIFIEDOBJECTRID = dbtab.RID
   INNER JOIN IGVIEWS.IGBUSINESSTERM term
      ON term.RID = classification.BUSINESSTERMRID
   INNER JOIN IGVIEWS.IGBUSINESSCATEGORY category
      ON category.RID = term.OFCATEGORY

GROUP BY
   category.NAME , term.NAME, datarule_dim.NAME

ORDER BY
   category.NAME

The next query is an example of anchoring the term and category dimension, based on the governs relationship between information governance rules and implemented data resources:

SELECT
   category.NAME Category,
   term.NAME Term,
   datarule_dim.NAME DataRule,
   MAX ( fact.TOTALRECORDS ) MaxNumRecordsProcessed

FROM IAVIEWS.IARULEEXECUTIONHISTORY fact
   INNER JOIN IAVIEWS.IARULE datarule_dim
      ON datarule_dim.RID = fact.RULERID
   INNER JOIN IGVIEWS.IGIMPLEMENTEDBYIARULE bridge1
      ON bridge1.IMPLEMENTATIONRULERID = fact.RULERID
   INNER JOIN IGVIEWS.IGBUSINESSRULE igrule_dim
      ON igrule_dim.RID = bridge1.BUSINESSRULERID
   INNER JOIN IGVIEWS.IGGOVERNEDOBJECTSOFARULE bridge2
      ON bridge2.BUSINESSRULERID = igrule_dim.RID
   INNER JOIN CMVIEWS.PDRDATABASETABLE dbtab
      ON dbtab.RID =  bridge2.GOVERNEDOBJECTRID
   INNER JOIN IGVIEWS.IGASSIGNEDOBJECTSOFATERM classification
      ON classification.CLASSIFIEDOBJECTRID = dbtab.RID
   INNER JOIN IGVIEWS.IGBUSINESSTERM term
      ON term.RID = classification.BUSINESSTERMRID
   INNER JOIN IGVIEWS.IGBUSINESSCATEGORY category
      ON category.RID = term.OFCATEGORY
  
GROUP BY
   category.NAME , term.NAME, datarule_dim.NAME

ORDER BY
   category.NAME
The query returns results for terms, categories, and data rules only if the following conditions are met:
  • The term has an asset of type table assigned.
  • This table is governed by an information governance rule.
  • This information governance rule is implemented by a data rule that has been run.
If the governs relationship applies to terms rather than tables, the following query applies:
SELECT
   category.NAME Category,
   term.NAME Term,
   datarule_dim.NAME DataRule,
   MAX ( fact.TOTALRECORDS ) MaxNumRecordsProcessed

FROM IAVIEWS.IARULEEXECUTIONHISTORY fact
   INNER JOIN IAVIEWS.IARULE datarule_dim
      ON datarule_dim.RID = fact.RULERID
   INNER JOIN IGVIEWS.IGIMPLEMENTEDBYIARULE bridge1
      ON bridge1.IMPLEMENTATIONRULERID = fact.RULERID
   INNER JOIN IGVIEWS.IGBUSINESSRULE igrule_dim
      ON igrule_dim.RID = bridge1.BUSINESSRULERID
   INNER JOIN IGVIEWS.IGGOVERNEDOBJECTSOFARULE bridge2
      ON bridge2.BUSINESSRULERID = igrule_dim.RID
   INNER JOIN IGVIEWS.IGBUSINESSTERM term
      ON term.RID =  bridge2.GOVERNEDOBJECTRID
   INNER JOIN IGVIEWS.IGBUSINESSCATEGORY category
      ON category.RID = term.OFCATEGORY
 
GROUP BY
   category.NAME , term.NAME, datarule_dim.NAME

ORDER BY
   category.NAME		

Applied to the example glossary, the results from running these queries might look like the data in the following table.

Category Term DataRule MaxNumRecProc
CRM CRM Record AgeInRangeNumeric_Bank_Demographics 1105
Contact Information Customer Contact BusAddZipCodeFormat 11005
Contact Information Customer Contact BusLastNameExists 11005
Contact Information Customer Contact BusLastNameInAnonTable 11005
Contact Information Supplier Contact BusAddZipCodeFormat 11005
Contact Information Supplier Contact BusLastNameExists 11005
Contact Information Supplier Contact BusLastNameInAnonTable 11005

Linking operational data with stewards

If you assign stewards to objects in the catalog, your information governance reports can more immediately specify who needs to look into an area that needs further investigation.

How to properly represent stewardship is closely related to standards that are in place that guide the use of catalog objects. See Reporting on the metadata in your metadata repository for an example of a query that can help to determine the types of stewardship assignment that are present in your metadata repository.

Regarding data rules, a steward can be assigned at the data rule level, at the level of information governance rules that are implemented by the data rule, or even at the information governance policy level. The model fragment representing these three assignment options looks like the following graphic:

Model fragment that represents data rules steward assignment
The following query implements a decision tree-based logic that tries to find a steward that is assigned to the following objects in this order. A SELECT DISTINCT statement is used as a simple way to aggregate related facts. In a real usage scenario, facts would typically be aggregated using computations, such as average or maximum. The query returns 'no steward assigned' if no steward is assigned to any of these objects:
  • Data rule
  • Information governance rule
  • Information governance policy
SELECT distinct
   datarule_dim.NAME,
   CASE
      WHEN steward1_dim.PRINCIPALID IS NULL THEN
         CASE
            WHEN steward2_dim.PRINCIPALID IS NULL THEN
               CASE
                  WHEN steward3_dim.PRINCIPALID IS NULL
                  THEN 'no steward assigned'
                  ELSE steward3_dim.PRINCIPALID
               END
            ELSE steward2_dim.PRINCIPALID
         END
      ELSE steward1_dim.PRINCIPALID
   END Steward,
   igrule_dim.NAME,
   policy_dim.NAME

FROM IAVIEWS.IARULEEXECUTIONHISTORY fact
   INNER JOIN IAVIEWS.IARULE datarule_dim
      ON datarule_dim.RID = fact.RULERID
   LEFT OUTER JOIN CMVIEWS.STEWARDSHIPASSIGNMENT bridge0
      ON bridge0.ASSIGNEDFORCOMMONOBJECTRID = datarule_dim.COMMONRULERID
   LEFT OUTER JOIN CMVIEWS.PRINCIPAL steward1_dim
      ON steward1_dim.RID = bridge0.ASSIGNSPRINCIPALRID
   INNER JOIN IGVIEWS.IGIMPLEMENTEDBYIARULE bridge1
      ON bridge1.IMPLEMENTATIONRULERID = fact.RULERID
   INNER JOIN IGVIEWS.IGBUSINESSRULE igrule_dim
      ON igrule_dim.RID = bridge1.BUSINESSRULERID
   LEFT OUTER JOIN CMVIEWS.STEWARDSHIPASSIGNMENT bridge2
      ON bridge2.ASSIGNEDFORCOMMONOBJECTRID = igrule_dim.RID
   LEFT OUTER JOIN CMVIEWS.PRINCIPAL steward2_dim
      ON steward2_dim.RID = bridge2.ASSIGNSPRINCIPALRID
   INNER JOIN IGVIEWS.IGRULEREFBYCONTAINERPOLICY bridge3
      ON bridge3.RULERID = igrule_dim.RID
   INNER JOIN IGVIEWS.IGPOLICY policy_dim
      ON policy_dim.RID = bridge3.POLICYRID
   LEFT OUTER JOIN CMVIEWS.STEWARDSHIPASSIGNMENT bridge4
      ON bridge4.ASSIGNEDFORCOMMONOBJECTRID = policy_dim.RID
   LEFT OUTER JOIN CMVIEWS.PRINCIPAL steward3_dim
      ON steward3_dim.RID = bridge4.ASSIGNSPRINCIPALRID

ORDER BY
   datarule_dim.NAME

The results from running this query might look like the data in the following table.

Data Rule Data Rule Steward Information Governance Rule Information Governance Policy
BooleanValue_Domain_Validation_BANK_ACCOUNTS.BANKCARD John Smith Boolean value check Domain validation
Completeness_Check_ACCOUNT_HOLDERS.NAME Stacy Steward Name values contain literals only Format validation
Demographic_age_check_BANK_CUSTOMERS Stanley Steward Consistency checks for minors Consistency validation
IGDConfigCheck_DR no steward assigned GD Configuration Parameter Validation IGD Dashboard Configuration Validity
IGDFailingOnPurpose_DR no steward assigned IGD Configuration Parameter Validation IGD Dashboard Configuration Validity

Relationships between data rules and information governance rules are of type n:m. Relationships between information governance rules and information governance policies are also of type n:m. The previous query returns more than one steward for a data rule that does not have a steward that is assigned but is assigned to multiple information governance rules with different stewards. This fact might affect the use of this dimension in aggregations. Aggregating facts per owner might lead to double-counting because facts for the same data rule would be added to the aggregate of both stewards.