Use PL/I to determine the algorithm for an IMS HALDB PSE

Experimenting with various PSE algorithms

IBM® Information Management System's High Availability Large Database (IMS HALDB) is a partitioned database. A Partition Selection Exit (PSE) is an Assembler routine that is used to distribute records across the partitions. There is no way to prove if the algorithm in a PSE will provide a desired distribution, without implementing the PSE beforehand. Since implementing a PSE affects database availability, a way is needed to evaluate PSE algorithms without affecting availability. This article describes an approach to evaluate algorithms using a High Level Language (PL/I) to produce a uniform distribution.

Rita Yen (Rita.Yen@WellPoint.com), IMS Database Administrator, WellPoint, Inc

Author photoRita Yen is an IMS Database Administrator for WellPoint, Inc. WellPoint is one of the nation’s largest health benefits companies, with 34 million members in its affiliated health plans, and a total of more than 65 million individuals served through its subsidiaries.



John Harper (jwharp@us.ibm.com), IMS Developer, IBM

Author photoJohn W. Harper is an IMS Development Programmer, and an IMS Technical Customer Advocate within the IBM IMS Database development team at the Silicon Valley Lab in California. He holds patents in the area of Very Large Databases and brings a wealth of IMS experience to the table.



Laurence England (englandl@us.ibm.com), Client Technical Architect, IBM

Laurence England photoLarry England works for IBM as the Client Technical Architect for WellPoint. His home location is Silicon Valley Lab in San Jose. When he is not working, you may find him running in the California hills.



26 July 2012

Also available in Russian Portuguese

Overview

IBM introduced the High Availability Large Database (HALDB) in IMS Version 7. HALDBs extend the IMS Full Function database size up to 40 Terabytes. This is accomplished by creating partitions of the data. HALDB supports up to 1001 partitions, and each partition can consist of ten datasets, each holding up to 4 Gigabytes. Partitioning improves the manageability of the databases, such as allowing a reorganization to be accomplished against a single partition instead of the entire database. Partition Selection Exits can improve the manageability of a HALDB by distributing records across the HALDB in various ways. One method of distribution could be used to insert new records across the entire database instead of clustering new records at the end. This reduces contention on the last partition and allows a database to grow more evenly, requiring less attention. This article describes how to use a High Level Language (HLL), PL/I to more easily determine the algorithm needed for a uniform distribution of records across the entire HALDB.

For more information on HALDB, see the Resources section for a link to an IBM Redbook called "The Complete IMS HALDB Guide".

Introduction to PSEs

In order to manage extremely large IMS databases, you can divide the database into smaller chunks or partitions.

Records are placed, by default, into a specific partition based upon the key of the record. This is called key range partitioning. With key range partitioning, a high key value is defined for each partition. A given partition holds all of the records with key values higher than the previous partition, and up to and including the value of the high key defined for this partition.

The goal of reducing contention can be achieved by creating partitions that are balanced, that is each partition would have approximately equivalent number of records. By inserting new database records across the entire database, the access patterns of the applications can be spread across the database.

There are cases where the use of a High Key Partitioning scheme is inadequate to achieve the goals of a balanced number of records within the partitions, which may cause contention when the application(s) access the records. For example, keys that are based upon a date or time stamp have a tendency to cluster or group at the end of the database.

Using a Partition Selection Exit (PSE)

There are cases where the High Key Partitioning scheme does not meet the goals of a balanced number of records within every partition. A Partition Selection Exit (PSE) can be used to programmatically determine which partition a record belongs. The algorithm in the PSE needs to be predictable where, given a key, it will always select the same partition.

This article looks at an example where the records tended to cluster and the partitions would not have a balanced number of records within every partition. The key is comprised of a set of fields that included a date, a two byte business unit number, and a four byte sequence number. The business unit number could be associated with a single number up to hundreds of numbers in the sequence. Prefixing each of these was a date/time number. The result was a clustering of records within the last partition. While processing the records, some of the applications encountered deadlocks while attempting to access records by other applications.

IMS automatically resolves contention conflicts by selecting an application as a victim, and abnormally ending (abend) the victim application. These abends result in lost CPU cycles that not only handle the abend, but also backing out updates and eventually retrying the original updates. There is a delay cost as well to the surviving applications as they wait for the termination/backout process to finish. The deadlocks are normally due to applications accessing record segments in different directions.

A classic example of this in an online environment is a sweep job looking for records that have finished a certain process. For performance reasons, a sweep job may use an index to find records with a certain status code. Using an index allows faster access of only the relevant records, as opposed to searching an entire database. As records are moved from one process to another, their index entries are deleted, and new index entries are created so the records can be found with their new status.

If the normal process inserts and updates records following a hierarchical path (root-then-dependent segments), IMS will first get locks on those segments in that order, and then get locks on the index. Since sweep jobs tend to access segments using an inverted or logical path (dependent-then-root segments), these jobs will contend with the normal process as they both attempt to create new records at the end of the index. One approach to reducing database contention is spreading updates across the database instead of at the very end. This greatly reduces the chances of deadlocks by reducing clusters of lock requests.

For an IMS HALDB, not only can the database be partitioned, but the indexes can be partitioned as well. In the following example, the index was divided into partitions, and a PSE was written to spread updates across all partitions. This method is intended to reduce data clustering and thus reduce application contention over the data.

There are some things that limit the adoption and implementation of a PSE. Due to runtime considerations, a PSE must be written in High Level Assembler (HLASM). While HLASM is not difficult, other languages are more commonly used, so availability of programmer skills, as well as ownership of the PSE, can be an issue. Once a PSE is written, it must be tested to show the effects of the algorithm. Before testing can occur, the PSE must be implemented by reorganizing the entire database. The turnaround time for implementing a PSE against a large database may limit the opportunities to experiment with the PSE. In production environments, strict service level agreements limit how often a database may be taken offline. Once implemented, a PSE must work perfectly from the start, because even minor adjustments are impossible to make without violating service level agreements. Thus, the lack of programmer skills and the need to write a perfect PSE conspire against using a PSE.

Given these challenges, how can you easily write an algorithm and test the distribution of the records across partitions, and determine how many partitions should be created? For this article, a simple program was written using PL/I that allowed the algorithms and the number of partitions to be varied without impacting production systems. Using real data as input, an optimal algorithm and the appropriate number of partitions were determined. The algorithm was then translated into High Level Assembler (HLASM) and inserted into the sample Partition Selection Exit.

Please note a sample PSE is shipped with IMS and can be found in the IMS SAMPLIB.


Determine the algorithm for uniform distribution

The goal is to create an algorithm that will consistently produce partitions that have equal or near equal number of records. The following will ensure an algorithm is created that will produce an even distribution of records across a set of partitions.

  • Write a PL/I program that reads a copy of the real data. Using the key, the program will execute the algorithm to select the appropriate partition. Note that IMS only supplies a segment's key to the exit, and not the entire segment. A report is then produced showing the distribution of the records resulting from the algorithm.
  • Experiment with various algorithms within the PL/I program to determine which algorithm would work the best given the data. This means potentially using different parts of the key as the selection criteria.
  • Once the best algorithm is determined, the algorithm is translated into HLASM within the Partition Selection Exit.

The PL/I program essentially provides a model to test various algorithms to determine which algorithm delivered the best and most even distribution. It is simple to change the algorithm within the PL/I program and re-run the model to view the results of the distribution. Another aspect you can vary beyond the algorithm is the number of partitions. It may be desirable to increase the number of partitions and still have the algorithm produce an even distribution of records within the each partition.

When experimenting with the algorithm, you may notice that not all key fields are needed to participate in the selection of the partition. Using the High Level Language of PL/I, you can produce a number of algorithms, interrogate the distribution results, and then select the algorithm that results in the best distribution.

Also under consideration is how full each partition can become, which determines the number of partitions necessary and allows for headroom for growth within each partition. The amount of desired headroom is determined by the anticipated growth of the data over time. If the data consumes a partition, additional partitions will need to be added, and the data re-partitioned.

A simplified PL/I program is shown in Listing 1.

Listing 1. Main logic for the PL/I program
Call InitPartitions();
Read File( InFile);
Do While ( not EndOfFile );
Call SelectPartition();
Read File( InFile);
End;
Call CalcPercent();
Call PrintPartitions();

Outline of main logic

The SelectPartition() procedure is the heart of the logic to select the correct partition. Here, the PL/I mod() built-in is used in the example. This is where you can vary the logic to determine the optimum distribution of records. Note the Pl/I code relies upon the structure ‘Buffer’ which is the record to be added into the database. Only the fields that form the key are revealed.

Once the algorithm is selected that produced the optimum distribution, translate the algorithm into HLASM from PL/I. In this example, the DR (Divide Register HLASM instruction) was used to for the ‘mod()’ PL/I function, and the LA (Load Address HLASM instruction) ensured a positive remainder, which represents the partition number indexed at zero. Note that PL/I indexed at 1 by default.

The Partition Key Table (PKT) contains information about the partitions. The Partition Key Table provides the address of a specific Partition Table Entry (PTE). The partition high key is used to search the table to obtain the PTE address.

The PKT consists of two sections. The first section is the prefix area which contains control information. The second section contains one or more key entries. All key entries are contiguous.

Once the partition number is determined, the correct PKT entry needs to be identified. There is a PKT entry for each partition, and the correct PKT entry must be selected. This is accomplished by calculating the stride into the table (the stride takes an index into the table multiplied by the size of each table entry using the MR Multiply Register HLASM instruction), then adding the stride to the starting address of the table (LA Load Address HLASM instruction).

A portion of the source of the PSE is provided as a download in this article, and the lines with 'LEE' in the sequence numbers identify the logic for selecting the correct partition.

Installing the PSE

For Installing the PSE, perform the following steps during a migration from a non-HALDB-partitioned database to a HALDB. If the database is already a HALDB, then a migration specifications can be omitted from the reorganization process.

Note: The following names are used in this section.

  • NONHALDB: Current database name
  • HALDBDB: HALDB database name
  • HALDBSX: HALDB secondary index
  1. Assemble and link Partition Selection Exits (PSEs) PSEDX and PSEX1 into test RESLIB.
  2. Unload the database NONHALDB with "MIGRATE=YES", as shown in Listing 2.
    Listing 2. Unload NONHALDB database
    //UNLOAD   EXEC PGM=DFSRRC00,PARM='ULU,DFSURGU0,NONHALDB,,,,,,,,,,,N'   
    //STEPLIB  	DD   DISP=SHR,DSN=regular.RESLIB      
    //* omitted lines 
    //SYSIN    	DD   *
    MIGRATE=YES
  3. Add DBDGEN HALDBDB (PHDAM) and HALDBSX(PSINDEX) into test DBDLIB.
  4. Create test RECONs.
  5. Register HALDBDB and HALDBSX databases into test RECONs as HALDBs with "PARTSEL(PSEDX)" and "PARTSEL(PSEX1)", as shown in Listing 3.
    Listing 3. Register the databases as HALDB
    INIT.DB    DBD(HALDBDB) SHARELVL(3) TYPHALDB PARTSEL(PSEDX)
        INIT.PART   DBD(HALDBDB) PART(HALDD*) -                          
                DSNPREFX(IMSP.IMS.HALDD*) BLOCKSZE(8192) -            
                FBFF(10) FSPF(10) GENMAX(3) HIBLOCK(420000)              
                            
        HALDBDB has 10 partitions.  
        So there are 10 INIT.PART and ‘*’ is from AA to AJ  
                            
    INIT.DB    DBD(HALDBSX) SHARELVL(3) TYPHALDB PARTSEL(PSEX1) 
        INIT.PART  DBD(HALDBSX) PART(HALDX*) -    
                DSNPREFX(IMSP.IMS.HALDX*) -    
                FBFF(10) FSPF(10) GENMAX(3)  
                            
        HALDBSX has 6 partitions.  So there are 6 INIT.PART and ‘*’ is from AA to AF
  6. Sort the unloaded database data set with IBM IMS High Performance Load: Physical Sequence Sort for Reload, as shown in Listing 4.
    Listing 4. Sort the database
    Sequence Sort (PSSR)
    //PSSR    EXEC PGM=FABSMAIN,     
    //* omitted lines 
    //SORTIN   	DD DSNAME=user.unloadds, 
    //SORTOUT	DD DSNAME=user.unloadds.sortout
    //* omitted lines 
    //CTL     DD  *   
    HALDBDB
  7. For the IDCAMS step, prepare all HALDB datasets.
  8. Load the HALDB database using PSSR output as database load input, as shown in Listing 5.
    Listing 5. Load the database
    //LOAD     EXEC PGM=HPSRRC00,PARM='ULU,DFSURGL0,HALDBDB' 
    //STEPLIB  DD   DISP=SHR,DSN=IBMTOOLS.LOADLIB
    //         DD   DISP=SHR,DSN=test.RESLIB  with PSEs in it            
    //* omitted lines  
    //DFSUINPT DD   DISP=OLD,DSN=user.unloadds.sortout
  9. Build a secondary index with IBM IMS Index Builder, as shown in Listing 6.
    Listing 6. Build the index
    //IXBUILD   EXEC PGM=IIUSTART,     
    //STEPLIB   DD   DISP=SHR,DSN=IBMTOOLS.LOADLIB    
    //          DD   DISP=SHR,DSN=test.RESLIB  with PSEs in it   
    //* omitted lines 
    //IIUIN     DD *                      
    PROC     	 BLD_SECONDARY,HALDBDB,ALL 
    INPUT    	 IBSCAN,DBRC=Y              
    MAXTASKS 	 02                         
    VIC      	 YES,IIU.UIC

Conclusion

Using a High Level Language such as PL/I to experiment with various partition selection algorithms gives flexibility and speeds the experimentation. Once the optimum partition selection algorithm is determined, it can then be translated into HLASM within the PSE itself.

This article shows how, after having implemented the PSE, tested it, and promoted the PSE into production, the partitions have an equivalent number of records and have reduced any clustering of records within a given partition. This, in turn, led to a reduction in application contention accessing the database records, and reduced the number of U777 ABENDS significantly. Your results will vary, but the customer observed two thirds fewer deadlocks for the high volume transaction targeted by the PSE. This occurred even in light of an increased number of records processed within the database.

Acknowledgment

A special thank you is extended to Moira McFadden for the copy edit of this article, and to all of the WellPoint DBAs who assisted in this project.


Downloads

DescriptionNameSize
Sample PLIplihal.zip4KB
Sample PSEdfspse00.zip4KB

Resources

Learn

Get products and technologies

  • Build your next development project with IBM trial software, available for download directly from developerWorks.
  • Evaluate IBM products in the way that suits you best: Download a product trial, try a product online, use a product in a cloud environment, or spend a few hours in the SOA Sandbox learning how to implement Service Oriented Architecture efficiently.

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Information management on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management
ArticleID=827199
ArticleTitle=Use PL/I to determine the algorithm for an IMS HALDB PSE
publish-date=07262012