InfoSphere Information Governance Catalog: Improving catalog and lineage performance in DB2 environment

Product Documentation

Abstract

IBM InfoSphere Information Governance Catalog might hang when you do these actions:
1) Invoke data lineage reports
2) Expand the "Usage" section in the Details page of an asset
3) Run queries where usage properties (Reads from / Writes to / Read by / Written by) are selected or used in conditions

Create, update, and delete operations, over time, can cause database tables and their indexes to become fragmented. As a result, performance is degraded.

Content

Do the following commands on the FLOWS_* metadata tables.

Do the RUNSTATS command to update the statistics about the characteristics of the tables and indexes. Recommended is RUNSTATS with detailed collection options:

RUNSTATS on table FLOWS_ASSET on all columns with distribution on all columns and sampled detailed indexes all;
RUNSTATS on table FLOWS_ASSETTAG on all columns with distribution on all columns and sampled detailed indexes all;
RUNSTATS on table FLOWS_ASSETTYPERESOURCES on all columns with distribution on all columns and sampled detailed indexes all;
RUNSTATS on table FLOWS_ASSETUSAGE on all columns with distribution on all columns and sampled detailed indexes all;
RUNSTATS on table FLOWS_CONTAINMENT on all columns with distribution on all columns and sampled detailed indexes all;
RUNSTATS on table FLOWS_EQUIVGROUP on all columns with distribution on all columns and sampled detailed indexes all;
RUNSTATS on table FLOWS_FLOW on all columns with distribution on all columns and sampled detailed indexes all;
RUNSTATS on table FLOWS_FLOWUNIT on all columns with distribution on all columns and sampled detailed indexes all;
RUNSTATS on table FLOWS_FLOWUNITRUN on all columns with distribution on all columns and sampled detailed indexes all;

Do the REORG INDEXES/TABLE command to reorganize the index or table. Example:

REORG table FLOWS_ASSET;
REORG table FLOWS_ASSETTAG;
REORG table FLOWS_ASSETUSAGE;
REORG table FLOWS_CONTAINMENT;
REORG table FLOWS_EQUIVGROUP;
REORG table FLOWS_FLOW;
REORG table FLOWS_FLOWUNIT;
REORG table FLOWS_FLOWUNITRUN;

Recheck the statistics with the RUNSTATS command.

Tip: Do these steps regularly to prevent performance degradation.

You can also use Db2 Automatic Statistics and define statistics profiles for the database tables.

Example for table FLOWS_ASSETS

RUNSTATS on table FLOWS_ASSET on all columns with distribution on all columns and sampled detailed indexes all SET PROFILE ONLY

Documentation:

https://www.ibm.com/docs/en/db2/11.5?topic=statistics-automatic-collection

https://www.ibm.com/docs/en/db2/11.5?topic=commands-runstats

Advanced optimization steps

Flows model optimization

Lineage related metadata tables can be further optimized by modifying datatypes of columns storing SID values. The SID value is an internal id within Flows model.
This chapter describes the steps to modify the data type of SID database columns in the metadata tables storing lineage/flows information. The change optimizes join conditions in sql queries issued by IGC to fetch lineage related information. This change also reduces space consumption by the modified metadata tables.

Indications to apply this optimization are:
- An environment having millions or more rows in any of FLOWS_* metadata tables
- Long running SQL statements against FLOWS_* metadata tables with a join condition on any SID column. Example:

select distinct c.sidLevel2_xmeta, a.sid_xmeta, a.type_xmeta, a.repr_xmeta from Flows_Asset a, Flows_Containment c where c.sidLevel2_xmeta in (?,?,?,?,?) and a.sid_xmeta=c.sidLevel1_xmeta and c.sidLevel2_xmeta <> c2.sidLevel1_xmeta

Important: Before these steps, back up the repository database.

Stop InfoSphere Information Server
Connect to repository database with the xmeta schema owner or set xmeta as current schema

Run the following SQL commands for repository on Db2

alter table FLOWS_ASSET
  alter column SCOPEUNITSID_XMETA set data type char(32)
  alter column SID_XMETA set data type char(32)
  alter column USEDASSETSID_XMETA set data type char(32);
call admin_cmd('REORG table FLOWS_ASSET');

alter table FLOWS_ASSETTAG 
  alter column ASSETSID_XMETA set data type char(32);
call admin_cmd('REORG table FLOWS_ASSETTAG');

alter table FLOWS_ASSETUSAGE 
  alter column ASSETSID_XMETA set data type char(32)
  alter column SCOPEUNITSID_XMETA set data type char(32)
  alter column SID_XMETA set data type char(32);
call admin_cmd('REORG table FLOWS_ASSETUSAGE');

alter table FLOWS_CONTAINMENT 
  alter column SIDLEVEL1_XMETA set data type char(32)
  alter column SIDLEVEL2_XMETA set data type char(32)
  alter column SIDLEVEL3_XMETA set data type char(32)
  alter column SIDLEVEL4_XMETA set data type char(32)
  alter column SIDLEVEL5_XMETA set data type char(32);
call admin_cmd('REORG table FLOWS_CONTAINMENT');

alter table FLOWS_EQUIVGROUP 
  alter column BESTSID_XMETA set data type char(32);
call admin_cmd('REORG table FLOWS_EQUIVGROUP');

alter table FLOWS_FLOW 
  alter column FLOWUNITSID_XMETA set data type char(32)
  alter column SOURCESID_XMETA set data type char(32)
  alter column TARGETSID_XMETA set data type char(32);
call admin_cmd('REORG table FLOWS_FLOW');

alter table FLOWS_FLOWUNIT 
  alter column SID_XMETA set data type char(32);
call admin_cmd('REORG table FLOWS_FLOWUNIT');

alter table FLOWS_FLOWUNITRUN 
  alter column FLOWUNITSID_XMETA set data type char(32)
  alter column SID_XMETA set data type char(32);
call admin_cmd('REORG table FLOWS_FLOWUNITRUN');

call admin_cmd('RUNSTATS on table FLOWS_ASSET on all columns with distribution on all columns and sampled detailed indexes all');
call admin_cmd('RUNSTATS on table FLOWS_ASSETTAG on all columns with distribution on all columns and sampled detailed indexes all');
call admin_cmd('RUNSTATS on table FLOWS_ASSETTYPERESOURCES on all columns with distribution on all columns and sampled detailed indexes all');
call admin_cmd('RUNSTATS on table FLOWS_ASSETUSAGE on all columns with distribution on all columns and sampled detailed indexes all');');
call admin_cmd('RUNSTATS on table FLOWS_CONTAINMENT on all columns with distribution on all columns and sampled detailed indexes all');
call admin_cmd('RUNSTATS on table FLOWS_EQUIVGROUP on all columns with distribution on all columns and sampled detailed indexes all');
call admin_cmd('RUNSTATS on table FLOWS_FLOW on all columns with distribution on all columns and sampled detailed indexes all');
call admin_cmd('RUNSTATS on table FLOWS_FLOWUNIT on all columns with distribution on all columns and sampled detailed indexes all');
call admin_cmd('RUNSTATS on table FLOWS_FLOWUNITRUN on all columns with distribution on all columns and sampled detailed indexes all');

Note: REORG is run twice for table FLOWS_CONTAINMENT to avoid error

 The operation was not performed because the table is in an invalid state for the operation. Table name: "XMETA.FLOWS_CONTAINMENT". Reason code: "23" . The maximum number of REORG-recommended alters have been performed. Up to three REORG-recommended operations are allowed on a table before a reorg must be performed, to update the tables rows to match the current schema.

Start InfoSphere Information Server

Additional information
The datatype optimization is not a part of IGC as InfoSphere Information Server is not supporting datatype changes in metadata columns.
The changes will not be lost after installing IGC patches nor after upgrading to Infosphere Information Server Service Pack or Fix Pack.

Reconfiguring IGC Lineage

Parameter: useSimpleQueryForLineageEquivCheck

Default value: false

Description:

Parameter useSimpleQueryForLineageEquivCheck controls one of the steps in Data Lineage Report generation. The step is using SQL query to fetch flows metadata. Depending on amount and distribution of the metadata or available database statistics, the default SQL query might be the quickest option. If the default configuration is resulting in slow processing, change the parameter and monitor environment. After the change, the same lineage processing step is using different SQL query. The change is global and affects any Data Lineage Report.

How to modify:

iisAdmin -set -key com.ibm.iis.gov.vr.setting.useSimpleQueryForLineageEquivCheck -value true

and wait 5 minutes.

Indications to apply this optimization are:
- An environment having millions or more rows in any of FLOWS_* metadata tables

- Long running Data Lineage Report request
- Long running SQL statements against FLOWS_* metadata tables with following pattern:

select distinct c1.sidLevel1_xmeta,c1.sidLevel3_xmeta,c2.sidLevel3_xmeta,ca2.eqGroupId_xmeta  from Flows_Asset ca1, Flows_Asset ca2, Flows_Containment c1, (select sidLevel3_xmeta, nameLevel3_xmeta, sidLevel4_xmeta from Flows_Containment cc2 where 1=1  group by sidLevel3_xmeta, nameLevel3_xmeta, sidLevel4_xmeta) c2  where ca1.eqGroupId_xmeta is not null and ca1.eqGroupId_xmeta = ca2.eqGroupId_xmeta and ca1.sid_xmeta <> ca2.sid_xmeta and c1.sidLevel4_xmeta = ca1.sid_xmeta and c2.sidLevel4_xmeta = ca2.sid_xmeta and c1.nameLevel3_xmeta = c2.nameLevel3_xmeta and c1.sidLevel1_xmeta in (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)

select distinct c1.sidLevel2_xmeta,c1.sidLevel3_xmeta,c2.sidLevel3_xmeta,ca2.eqGroupId_xmeta from Flows_Asset ca1, Flows_Asset ca2, Flows_Containment c1, Flows_Containment c2 where ca1.eqGroupId_xmeta is not null and ca1.eqGroupId_xmeta = ca2.eqGroupId_xmeta and c2.sidLevel1_xmeta=c2.sidLevel2_xmeta and ca1.sid_xmeta <> ca2.sid_xmeta and c1.sidLevel4_xmeta = ca1.sid_xmeta and c2.sidLevel4_xmeta = ca2.sid_xmeta and c1.nameLevel3_xmeta = c2.nameLevel3_xmeta and c1.sidLevel1_xmeta=c1.sidLevel2_xmeta and c1.sidLevel2_xmeta in (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)

The problematic queries might run for several minutes with high values for performance metrics: ROWS_READ number and SORT time.

The processing step controlled with parameter useSimpleQueryForLineageEquivCheck can be further optimized by using Db2 Statistical Views.

How to create and enable Db2 Statistical Views

create view flows_containment_x_flows_asset_1 as ( select Q6.* from flows_containment as q4, flows_asset as q6 where (q4.sidlevel4_xmeta = q6.sid_xmeta) )
create view flows_containment_x_flows_asset_2 as ( select Q6.* from flows_containment as q4, flows_asset as q6 where (q4.sidlevel4_xmeta = q6.sid_xmeta) and (q4.sidlevel1_xmeta = q4.sidlevel2_xmeta) )
alter view flows_containment_x_flows_asset_1 ENABLE query optimization
alter view flows_containment_x_flows_asset_2 ENABLE query optimization
runstats on table flows_containment_x_flows_asset_1 with distribution
runstats on table flows_containment_x_flows_asset_2 with distribution

Disabling:

alter view flows_containment_x_flows_asset_1 DISABLE query optimization
alter view flows_containment_x_flows_asset_2 DISABLE query optimization

Related Information

Lineage reports take long time and give Oracle error

Slow DataStage Jobs lineage detection

How to improve performance for automatic update of lineage flows

How do you improve InfoSphere Information Governance Catalog (IGC) lineage perf…

[{"Line of Business":{"code":"LOB10","label":"Data and AI"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSZJPZ","label":"IBM InfoSphere Information Server"},"ARM Category":[{"code":"a8m500000008aNNAAY","label":"Governance Catalog - IGC-\u003ELineage"}],"ARM Case Number":"","Platform":[{"code":"PF002","label":"AIX"},{"code":"PF016","label":"Linux"},{"code":"PF027","label":"Solaris"},{"code":"PF033","label":"Windows"}],"Version":"All Version(s)"}]

Tips

InfoSphere Information Governance Catalog: Improving catalog and lineage performance in DB2 environment

Product Documentation

Abstract

Content

Related Information

Was this topic helpful?

Document Information

UID

Share your feedback

Need support?