Options for improving load performance
There are various command parameters that you can use to optimize load performance. There are also a number of file type modifiers unique to load which can, in some cases, significantly improve that utility's performance.
Command parameters
The load utility attempts to deliver the best performance possible by determining optimal values for DISK_PARALLELISM, CPU_PARALLELISM, and DATA BUFFER, if these parameters have not be specified by the user. Optimization is done based on the size and the free space available in the utility heap. Consider using the autonomic DISK_PARALLELISM and CPU_PARALLELISM settings before attempting to tune these parameters for your particular needs.
Following is information about the performance implications of various options available through the load utility:
- ALLOW READ ACCESS
- This option allows you to query a table while a load operation is in progress. You can only view data that existed in the table prior to the load operation. If the INDEXING MODE INCREMENTAL option is also specified, and the load operation fails, the subsequent load terminate operation might have to correct inconsistencies in the index. This requires an index scan which involves considerable I/O. If the ALLOW READ ACCESS option is also specified for the load terminate operation, the buffer pool is used for I/O.
- COPY YES or NO
- Use this parameter to specify whether a copy of the input data is to be made during a load operation. COPY YES, which is only applicable when forward recovery is enabled, reduces load performance because all of the loading data is copied during the load operation. The increased I/O activity might increase the load time on an I/O-bound system. Specifying multiple devices or directories (on different disks) can offset some of the performance penalty resulting from this operation. COPY NO, which is only applicable when forward recovery is enabled, does not affect load performance. However, all table spaces related to the loaded table will be placed in a Backup Pending state, and those table spaces must be backed up before the table can be accessed.
- CPU_PARALLELISM
- Use this parameter to exploit the number of processes running per database partition (if this is
part of your machine's capability), and significantly improve load performance. The parameter
specifies the number of processes or threads used by the load utility to parse, convert, and format
data records. The maximum number allowed is 80. If there is insufficient memory to support the
specified value, the utility adjusts the value. If this parameter is not specified, the load utility
selects a default value that is based on the number of CPUs on the system.
Record order in the source data is preserved (see Figure 1) regardless of the value of this parameter, provided that:
- the
anyorder
file type modifier is not specified - the PARTITIONING_DBPARTNUMS option (and more than one partition is to be used for partitioning) is not specified
If tables include either LOB or LONG VARCHAR data, CPU_PARALLELISM is set to 1. Parallelism is not supported in this case.
Although use of this parameter is not restricted to symmetric multiprocessor (SMP) hardware, you might not obtain any discernible performance benefit from using it in non-SMP environments.
Figure 1. Record Order in the Source Data is Preserved When the Number of Processes Running Per Database Partition is Exploited During a Load Operation - the
- DATA BUFFER
- The DATA BUFFER parameter specifies the total amount of memory, in 4 KB units, allocated to the load utility as a buffer. It is recommended that this buffer be several extents in size. The data buffer is allocated from the utility heap; however, the data buffer can exceed the setting for the util_heap_sz database configuration parameter as long as there is available memory in the system.
- DISK_PARALLELISM
- The DISK_PARALLELISM parameter specifies the number of processes or threads used by the load utility to write data records to disk. Use this parameter to exploit available containers when loading data, and significantly improve load performance. The maximum number allowed is the greater of four times the CPU_PARALLELISM value (actually used by the load utility), or 50. By default, DISK_PARALLELISM is equal to the sum of the table space containers on all table spaces containing objects for the table being loaded, except where this value exceeds the maximum number allowed.
- NONRECOVERABLE
- If forward recovery is enabled, use this parameter if you do not need to be able to recover load
transactions against a table upon rollforward. A NONRECOVERABLE load and a
COPY NO load have identical performance. However, there is a significant difference
in terms of potential data loss. A NONRECOVERABLE load marks a table as not
rollforward recoverable while leaving the table fully accessible. This can create a problematic
situation in which if you need to rollforward through the load operation, then the loaded data as
well as all subsequent updates to the table will be lost. A COPY NO load places all
dependent table spaces in the Backup Pending state which renders the table inaccessible until a
backup is performed. Because you are forced to take a backup after that type of load, you will not
risk losing the loaded data or subsequent updates to the table. That is to say, a COPY NO load is
totally recoverable.Note: When these load transactions are encountered during subsequent restore and rollforward recovery operations, the table is not updated, and is marked invalid. Further actions against this table are ignored. After the rollforward operation is complete, the table can only be dropped.
- SAVECOUNT
- Use this parameter to set an interval for the establishment of consistency points during the
load phase of a load operation. The synchronization of activities performed to establish a
consistency point takes time. If done too frequently, there is a noticeable reduction in load
performance. If a very large number of rows is to be loaded, it is recommended that a large
SAVECOUNT value be specified (for example, a value of 10 million in the case of a
load operation involving 100 million records).
A load restart operation automatically continues from the last consistency point, provided that the load restart operation resumes from the load phase.
- STATISTICS USE PROFILE
- Collect statistics specified in table statistics profile. Use this parameter to collect data
distribution and index statistics more efficiently than through invocation of the RUNSTATS utility
following completion of the load operation, even though performance of the load operation itself
decreases (particularly when DETAILED INDEXES ALL is specified).
For optimal performance, applications require the best data distribution and index statistics possible. Once the statistics are updated, applications can use new access paths to the table data based on the latest statistics. New access paths to a table can be created by rebinding the application packages using the BIND command. The table statistics profile is created by running the RUNSTATS command with the SET PROFILE options.
When loading data into large tables, it is recommended that a larger value for the stat_heap_sz (statistics heap size) database configuration parameter be specified.
- USE <tablespace-name>
- When an ALLOW READ ACCESS load is taking place and the indexing mode is
REBUILD, this parameter allows an index to be rebuilt in a system temporary table
space and copied back to the index table space during the index copy phase of a load operation.
By default, the fully rebuilt index (also known as the shadow index) is built in the same table space as the original index. This might cause resource problems as both the original and the shadow index reside in the same table space simultaneously. If the shadow index is built in the same table space as the original index, the original index is instantaneously replaced by the shadow. However, if the shadow index is built in a system temporary table space, the load operation requires an index copy phase which copies the index from a system temporary table space to the index table space. There is considerable I/O involved in the copy. If either of the table spaces is a DMS table space, the I/O on the system temporary table space might not be sequential. The values specified by the DISK_PARALLELISM option are respected during the index copy phase.
- WARNINGCOUNT
- Use this parameter to specify the number of warnings that can be returned by the utility before a load operation is forced to terminate. Set the WARNINGCOUNT parameter to a relatively low number if you are expecting only a few or no warnings. The load operation stops after the WARNINGCOUNT number is reached. This gives you the opportunity to correct problems before attempting to complete the load operation.
File type modifiers
- ANYORDER
-
By default, the load utility preserves record order of source data. When load is operating under an SMP environment, synchronization between parallel processing is required to ensure that order is preserved.
In an SMP environment, specifying theanyorder
file type modifier instructs the load utility to not preserve the order, which improves efficiency by avoiding the synchronization necessary to preserve that order. However, if the data to be loaded is presorted,anyorder
might corrupt the presorted order, and the benefits of presorting are lost for subsequent queries.Note: Theanyorder
file type modifier has no effect if CPU_PARALLELISM is 1, and it is not compatible with the SAVECOUNT option. - BINARYNUMERICS, ZONEDDECIMAL, and PACKEDDECIMAL
-
For fixed length non-delimited ASCII (ASC) source data, representing numeric data in binary can result in improved performance when loading. If the
packeddecimal
file type modifier is specified, decimal data is interpreted by the load utility to be in packed decimal format (two digits per byte). If thezoneddecimal
file type modifier is specified, decimal data is interpreted by the load utility to be in zoned decimal format (one digit per byte). For all other numeric types, if thebinarynumerics
file type modifier is specified, data is interpreted by the load utility to be in binary format.Note:- When the
binarynumerics
,packeddecimal
, orzoneddecimal
file type modifiers are specified, numeric data is interpreted in big-endian (high byte first) format, regardless of platform. - The
packeddecimal
andzoneddecimal
file type modifiers are mutually exclusive. - The
packeddecimal
andzoneddecimal
file type modifiers only apply to the decimal target columns, and the binary data must match the target column definitions. - The
reclen
file type modifier must be specified when thebinarynumerics
,packeddecimal
, orzoneddecimal
file type modifiers are specified.
- When the
- FASTPARSE
-
The fastparse file type modifier reduces the data checking that is performed on user-supplied column values from ASC and DEL files. Using this modifier reduces syntax checking which can improve load's performance by up to 20 percent
Using the fastparse option is not recommended unless it is certain that the input data is completely valid. If invalid data, such as an incorrectly formatted timestamp, is submitted using the fastparse option, some SQL operations may propagate the invalid data to other parts of the database without detection. When the invalid data is later detected, it may be very difficult to trace down its origin, or how many other locations it has been copied to.
- NOROWWARNINGS
-
During a load operation, warning messages about rejected rows are written to a specified file. However, if the load utility has to process a large volume of rejected, invalid or truncated records, it can adversely affect load's performance. In cases where many warnings are anticipated, it is useful to use the
norowwarnings
file type modifier to suppress the recording of these warnings. - PAGEFREESPACE, INDEXFREESPACE, and TOTALFREESPACE
-
As data is inserted and updated in tables over time, the need for table and index reorganization grows. One solution is to increase the amount of free space for tables and indexes using
pagefreespace
,indexfreespace
, andtotalfreespace
. The first two modifiers, which take precedence over the PCTFREE value, specify the percentage of data and index pages that is to be left as free space, whiletotalfreespace
specifies the percentage of the total number of pages that is to be appended to the table as free space.