Partitioning strategy for the MDM database

If you have a large amount of data in the InfoSphere® MDM data store, you may experience long response times and high CPU consumption on the database. One approach to address performance and response time issues is to employ a partitioning solution.

Important: This solution is only for DB2® on z/OS®. It cannot be used on DB2 or Oracle in a distributed installation. If you are using a distributed environment different from DB2 for z/OS, you might be interested partitioning your database tables for other reasons, but it is recommended that you evaluate the trade-off between the benefit and cost in those environments by doing your own proof of concept. Testing shows that partitioning the tables in the distributed environments (for example, DB2 on AIX®) is not necessary.

Tables that have a small number of rows and are not expected to grow significantly do not benefit from partitioning. One rule of thumb is that if a table is not expected to grow to more than 10 million rows, you do not need to partition it. It is also recommended that you consider the clustering key at the same time that you do table partitioning. Base the clustering key on your workload access pattern.

When running the InfoSphere MDM database on DB2 for z/OS, partitioning the major database tables can minimize synchronous inputs and outputs, which can help to achieve optimal performance.

To facilitate the partitioning strategy, the current release implements database range partitioning for the key tables by including the DDL scripts and the partitioning friendly key generator. Other tables that do not require range partitioning are partitioned by growth according to best practices from DB2 for z/OS.

The solution has two advantages:

The MDM database tables are property partitioned into ranges with the correct partitioning keys (same as the default Primary Key).
The partitioning key values are generated to ensure that all the related InfoSphere MDM objects share the same partitioning key digits. Doing so means that the related objects are stored into the same partitioning range in the respective tables.

By default the Primary Key index is also the clustering index in order to ensure that the related objects that belong to the same parent entity are clustered together to improve database Input/Output efficiency.

Please refer to the related links for more information about the default key generator.

In this release, only the DDL scripts for DB2 for z/OS include the changes related to partitioning. However, you can apply a similar partitioning strategy to DB2 for Linux®, UNIX, and Windows or other supported databases, if required. Work with your DBAs and the respective database vendors as you determine whether to customize DDL scripts. You can use the DDL scripts for DB2 for z/OS in the installation package as a reference: Look for them within the MDM_disks\StarupToolkit directory in the InfoSphere MDM build.

Uncompress the scripts from the MDM_disks\StartupToolkit directory. After you uncompress the startup_toolkit_disks.zip file, look for the z/OS DDL scripts within the MDM/database directory.

For your convenience, the following example of one DDL shows how the tablespace and table are created:

CREATE TABLESPACE TS000003
  IN <db_prefix>ZLOC
  USING STOGROUP  <STOGROUP_NAME>  DEFINE NO
  NUMPARTS 20
  SEGSIZE 64
  LOCKSIZE ROW
  MEMBER CLUSTER
  COMPRESS YES
  CLOSE NO
  BUFFERPOOL BP5     ;
  GRANT USE    OF TABLESPACE <db_prefix>ZLOC.TS000003
  TO <USER ACCOUNT>;
 -- DEFINITION FOR TABLE ADDRESS

By default, the partition key generator generates the 18-digit key values. The following ranges show 18-digit partition values.

CREATE  TABLE  ADDRESSGROUP
  (LOCATION_GROUP_ID  DECIMAL(19)  NOT  NULL,
  ADDRESS_ID  DECIMAL(19)  NOT  NULL,
  CARE_OF_DESC  VARCHAR(50),
  ADDR_USAGE_TP_CD  DECIMAL(19)  NOT  NULL,
  LAST_UPDATE_DT  TIMESTAMP  NOT  NULL  DEFAULT,
  LAST_UPDATE_USER  VARCHAR(20),
  LAST_UPDATE_TX_ID  DECIMAL(19),
  CONSTRAINT  P_ADDRESSGROUP  PRIMARY  KEY
  (LOCATION_GROUP_ID
  )
  )
  IN  <db_prefix>ZLOC.TS000004
   PARTITION BY (LOCATION_GROUP_ID)
 (PART 1 VALUES(145000000000000000.),
   PART 2 VALUES(190000000000000000.),
   PART 3 VALUES(235000000000000000.),
   PART 4 VALUES(280000000000000000.),
   PART 5 VALUES(325000000000000000.),
   PART 6 VALUES(370000000000000000.),
   PART 7 VALUES(415000000000000000.),
   PART 8 VALUES(460000000000000000.),
   PART 9 VALUES(505000000000000000.),
   PART 10 VALUES(550000000000000000.),
   PART 11 VALUES(595000000000000000.),
   PART 12 VALUES(640000000000000000.),
   PART 13 VALUES(685000000000000000.),
   PART 14 VALUES(730000000000000000.),
   PART 15 VALUES(775000000000000000.),
   PART 16 VALUES(820000000000000000.),
   PART 17 VALUES(865000000000000000.),
   PART 18 VALUES(910000000000000000.),
   PART 19 VALUES(955000000000000000.),
   PART 20 VALUES(9223372036854775807.));
CREATE  UNIQUE  INDEX  U1_ADDRESSGROUP
  ON  ADDRESSGROUP
  (
  LOCATION_GROUP_ID
  );

Note that the first partition can potentially hold all the key values from zero (0 digit) to 145,000,000,000,000,000 (18 digits). Similarly, the last partition can potentially hold all the key values from 955,000,000,000,000,001 (18 digits) to 9,223,372,036,854,775,807 (19 digits). That number is the maximum for a BIGINT data type in DB2. This configuration ensures that all key values, regardless of their number of digits, have a pre-allocated partition in the database, even though the primary keys are generated with 18 digits for the default configuration.

You can also partition by growth for all tables not using range partitions:

CREATE TABLESPACE TS000002
  IN <db_prefix>ZBANK
  USING STOGROUP  <STOGROUP_NAME>  DEFINE NO
  DSSIZE 2G
  MAXPARTITIONS 15
  SEGSIZE 4
  LOCKSIZE PAGE
  COMPRESS YES
  CLOSE YES
  BUFFERPOOL BP1     ;
  GRANT USE    OF TABLESPACE <db_prefix>ZBANK.TS000002
  TO <USER ACCOUNT>;
 -- DEFINITION FOR TABLE ADDACTIONTYPE