Recent DB2 Utilities Suite for z/OS (DB2 Utilities Suite) enhancements show IBM’s continuing investment in the DB2 Utilities Suite and how it helps to make the platform even more strategic. Many of these enhancements were first described by customers as challenges. IBM continually listens to customer concerns, and because these enhancements were valuable to the DB2 community, the development lab spent the time and resources to create solutions that many users of the DB2 utilities would find useful. These enhancements are expected to not only improve performance, usability, and availability, but also illustrate that the DB2 Utilities development group at the Silicon Valley Lab are committed to improving your return on investment.
The following DB2 Utility Suite enhancements are in the maintenance stream:
- LOAD/UNLOAD parameter - NUMRECS
- LOAD parameter - PRESORTED
- LOAD/UNLOAD parameter - FORMAT INTERNAL
- REPAIR SET options - RBDP or PAGE SET RBDP and new LOAD utility behavior enhancements
- REORG parameter - PARALLEL YES | NO
The LOAD utility can more efficiently determine the amount of sort work space to be allocated if the utility knows ahead of time approximately how many records it needs to sort, especially when the input data set resides on tape. In the past, the SORTKEYS parameter was necessary to provide this information to the LOAD utility. A formula in the DB2 Utility Guide and Reference describes how to estimate the number of keys to sort. Determining the correct value for the SORTKEYS parameter can be challenging, and it is easy to make a mistake with the calculation. Users have indicated that there are a few problems with specifying the SORTKEYS value. First, the formula is complicated, and the value is not actually the number of records you are loading but the number of records you are loading multiplied by the number of indexes you have on the table. If you have RI relationships on the table, you have to take this into account when you determine the proper SORTKEYS specification. Luckily, when you unload the data in a table, the UNLOAD utility provides the correct SORTKEYS specification in the LOAD control statement that it generates.
However, you might have unloaded the table in a production environment that contains a different number of indexes than your test environment. As a result, using the number generated by the UNLOAD utility might be grossly inaccurate, or you might have unloaded the data and since then created a few additional indexes. Not taking these additional indexes into account could hurt the amount of allocated sort work data sets and cause the UNLOAD utility to fail with SORT CAPACITY EXCEEDED errors.
When you are loading a segmented table space with several tables of uneven sizes, any large differences in each table data distribution could result in over allocation of sort work space.
In this enhancement the SORTKEYS parameter at the table space level is replaced by a new NUMRECS parameter at the table level. Without doing any calculations, you can use the NUMRECS parameter to specify the number of rows in the table, and the LOAD utility will use the number of indexes at the time you run the LOAD to determine the necessary calculations. This functionality greatly simplifies the invocation of the LOAD utility. Using the NUMRECS parameter instead of the SORTKEYS parameter improves usability and might reduce those LOAD utility failures where a SORTKEYS value was calculated incorrectly. For situations in which you might have an uneven distribution of data, using the NUMRECS parameter might give a more accurate value than SORTKEYS. NUMRECS and SORTKEYS are mutually exclusive parameters. SORTKEYS continues to be a supported parameter.
The NUMRECS parameter is also available in DB2 V8 to assist users who are planning a skip level migration from DB2 V8 to DB2 10. If you are running coexistence with two versions of DB2, namely V8 and 10, you can unload in DB2 10 and load into DB2 V8, ensuring that your utilities will not fail because of incorrect calculations.
The NUMRECS parameter enhancement for the UNLOAD and LOAD utilities in DB2 V8, 9, and 10 is in APARs PK88970, PK88972, and PK88974.
Use the LOAD PRESORTED option when you know the data is already sorted in order of the clustering index. In this case, you can avoid sorting the clustering index. If you have several indexes, you will not benefit in elapsed time because of the parallel index builds, but you will save some additional CPU usage. Naturally, if only one index is on the table, and your data is presorted, you can skip the index sort, and elapsed time might improve. From an availability point of view, if you are running a LOAD REPLACE of a segmented table space, for example, there is no access to the data while the LOAD REPLACE runs. So, if you have an opportunity to preprocess the data to sort it in clustering order, you can reduce the duration of the LOAD REPLACE. In effect, the outage during the LOAD REPLACE is shortened and results in a higher availability for the underlying table.
If you select the PRESORTED option and the data isn't really presorted, the LOAD utility still completes successfully but you might not realize the performance benefit. The LOAD utility detects that the data is not sorted in order of the clustering index and adjusts while the LOAD utility runs. Therefore, use the PRESORTED keyword to improve performance only if you can guarantee that the data is not sorted. Otherwise, the performance could be worse than if the LOAD utility were run without the PRESORTED option.
Both the LOAD and UNLOAD utilities can take advantage of the new FORMAT INTERNAL parameter. DB2 table data is stored on disk in an internal, proprietary DB2 format. During the unload process, this internal format is converted to an external format in the SYSREC data set. During a subsequent load process, this external representation of data is changed back to the DB2 internal format within the data page. Performing this data conversion uses CPU and elapsed time for both the LOAD and UNLOAD utilities.
REORG already uses a process that unloads the data and reloads the data in this proprietary DB2 internal format. With this enhancement, if you are unloading data from TABLEA and reloading the data into TABLEB, where TABLEA and TABLEB have equivalent column definitions, using the new FORMAT INTERNAL parameter for both the UNLOAD and LOAD utilities can save considerable CPU and elapsed time for both utilities. You do have to ensure that the column definitions are the same, but you do not have to worry about unlike attributes such as compression, OBID translation, segment sizes, reordered row format, or basic row format.
In DB2 9 FORMAT INTERNAL might be useful when you need to change a table space attribute, such as DSSIZE. Then unload the data from TABLEA, drop the table space for TABLEA, re-create the table space with the new DSSIZE, and reload the data into TABLEA, re-establishing indexes, authorizations, triggers, and RI relationships as necessary. In DB2 10, this process can be simplified by using the ALTER command followed by a REORG. This deferred ALTER is only for table spaces defined as a UTS in DB2 10.
In some cases, we have seen potentially up to an 85% CPU and elapsed time reduction on UNLOAD and up to a 77% elapsed time and 56% CPU reduction on LOAD. The difference between UNLOAD and LOAD is because of the number of indexes, and it certainly can vary depending on the underlying data. The more indexes you have on a table decreases the CPU usage and elapsed time percentages on the LOAD because of the necessary index sorting.
For a LOAD statement that uses FORMAT INTERNAL, field specification is not allowed, which considerably simplifies the LOAD statement, enhances usability, aids productivity, and delivers performance savings on both elapsed time and CPU usage.
The following example shows an UNLOAD statement that uses FORMAT INTERNAL:
UNLOAD TABLESPACE DBB.TSS FORMAT INTERNAL
The following example shows a LOAD statement that uses FORMAT INTERNAL:
LOAD INDDN SYSREC REPLACE FORMAT INTERNAL INTO TABLE "SYSADM"."TBB"
The PRESORTED and FORMAT INTERNAL enhancements for the UNLOAD and LOAD utilities are in APAR PM19584 and are available with DB2 9 and DB2 10.
Most of the CPU cost for batch inserts is because of index maintenance. When an SQL INSERT statement is run, DB2 must insert the index key, split leaf pages, and so on. Users have learned a methodology for running batch inserts quickly. They often drop the indexes before the batch job runs. Then, they run the batch insert job, and re-create and rebuild the indexes to reduce the amount of CPU and elapsed time for the batch insert process as a whole.
A number of enhancements can assist with this process. Before the REPAIR SET options enhancements, the REPAIR utility could only reset restricted states. Now you can use the REPAIR utility to set a restricted state on an index, specifically RBDP (RBDP) or PAGE SET RBDP (PSRBD). If you run the REPAIR utility to set RBDP before you run your batch insert program, the batch insert job will not insert into the index because the index is assumed to be not UNIQUE. If the index is a UNIQUE index, the batch insert job will fail. Now you can insert into a page set even though the index is in RBDP status. By using this feature you can set RBDP status, run your insert batch job, and then run the REBUILD INDEX utility. Depending on your situation, running the REBUILD INDEX utility might be useful because you can eliminate the step to drop the indexes and the step to rebind the plans and packages that are associated with this table.
You might prefer to drop indexes to obtain increased LOAD PARTITION performance. Before this enhancement, if you had an index on a logical partition that was in RBDP status, the LOAD failed. Now, if you have a partitioning index that is in RBDP status, a LOAD REPLACE at the PART level works because the entire index will be rebuilt anyway. However, if you run a LOAD RESUME against that partition, the LOAD fails. Likewise, if you have nonpartitioned secondary indexes (NPSIs) on the table and they are in RBDP status, the LOAD fails.
Instead of dropping indexes before the LOAD to increase LOAD PARTITION performance, consider using the new INDEXDEFER option on LOAD. This option defers the index build, especially in the case of LOAD PART operations. When the index build for an index or index part is deferred, it is left in RBDP status. When the LOAD ends, run REBUILD INDEX. Of course, if the index is UNIQUE, you might need this index to enforce uniqueness. Another option is to use the NONUNIQUE keyword and defer only those indexes that are not UNIQUE.
The REPAIR SET options enhancements are in APARs PM27962, PM08585, and PM27962.
Use the new REORG utility PARALLEL keyword to indicate whether to process partitions in parallel when they are provided as input to REORG in a LISTDEF using the PARTLEVEL option. You can REORG multiple partition ranges in DB2 9. For example, if you need to reorganize different noncontiguous partitions, such as partitions 3-4, 10, and 12, issue the following statement.
REORG table-space-name PART (3-4,10, 12)
Partitions are processed in parallel during UNLOAD and LOAD phases.
Similarly, if you use a LISTDEF for partitions 3-4, 10, and 12, a REORG will be issued for these partitions. If any nonpartitioned secondary indexes (NPSIs) exist, they are processed serially, and each NPSI is repeatedly shadowed three times.
It is unlikely that users would want to process each NPSI serially. Therefore, the reorganization of each partition is done in parallel, and each NPSI is processed once. This enhancement came in APAR PK87762.
However, in certain unique cases, the new behavior was not acceptable. If DASD was constrained and you could not shadow multiple partitions at the same time, a way to successfully complete the REORG was necessary. As a result, the new PARALLEL keyword is introduced. By using this keyword, you can choose whether the REORG utility will process these partitions in parallel on the LISTDEF.
The PARALLEL keyword is in APAR PM25525. PARALLEL YES provides more efficient processing, a higher degree of parallelism, reduced CPU usage and elapsed time, and improved availability because it only processes the NPSIs once. However, you might need PARALLEL NO when DASD availability is constrained. The default value for the PARALLEL keyword is YES, which gives the most efficient processing, but specifying PARALLEL YES can cause problems if the new behavior is not tolerated when the PTF is applied. To prevent these problems, a new ZPARM to govern the PARALLEL keyword at the subsystem level is in APAR PM37293. If you are concerned about processing partitions in parallel in REORG when APAR PM25525 is applied, additionally apply the PTF for APAR PM37293.
The PARALLEL keyword for REORG is in APAR PM25525 and is available for DB2 9 and 10. The ability to REORG noncontiguous partitions is in APAR PK87762 and is available for DB2 9 and 10. The ability to override the PARALLEL keyword default value at the subsystem level is in APAR PM37293 and is available for DB2 9 and 10.
Staying informed about new features and functions that are provided by DB2 in the service stream, especially if they might solve an issue you are having in your environment, is important. This article explains many improvements to the DB2 Utilities Suite that will continue to improve your return on investment of DB2 in your operating environment. We recommend that you stay current with DB2 maintenance and to be aware of these new features as they become available.
- See The DB2 for z/OS REORG utility and noncontiguous partitions
developerWorks article for more information about the ability to REORG noncontiguous partitions..
- In the DB2
for z/OS area on developerWorks, get the resources you need to advance your DB2 skills.
- Learn more yabout Information Management at the developerWorks Information Management
zone. Find technical documentation,
how-to articles, education, downloads, product information, and
- Stay current with
events and webcasts.
- Follow developerWorks on
Get products and technologies
- Build your next
development project with
IBM trial software,
available for download directly from developerWorks.
- Participate in the discussion forum.
- Check out the
blogs and get involved in the
Mary Petras works with the technical support team for DB2 for z/OS Tools and utilities. Mary joined IBM in 1996 and has worked with customers on their DB2 data sharing migrations. She has presented at conferences for the International DB2 Users Group (IDUG) and IBM Information On Demand (IOD), and she is a co-author of six IBM Redbooks publications devoted to DB2 performance, DB2 for z/OS performance tools, and DB2 data sharing.