There are various command parameters that you can use to
optimize load performance. There are also a number of file type modifiers
unique to load which can, in some cases, significantly improve that
utility's performance.
Command
parameters
The load utility attempts to deliver the best
performance possible by determining optimal values for DISK_PARALLELISM, CPU_PARALLELISM,
and DATA BUFFER, if these parameters have not be
specified by the user. Optimization is done based on the size and
the free space available in the utility heap. Consider using the autonomic DISK_PARALLELISM and CPU_PARALLELISM settings
before attempting to tune these parameters for your particular needs.
Following
is information about the performance implications of various options
available through the load utility:
- ALLOW READ ACCESS
- This option allows you to query a table while a load operation
is in progress. You can only view data that existed in the table prior
to the load operation. If the INDEXING MODE INCREMENTAL option
is also specified, and the load operation fails, the subsequent load
terminate operation might have to correct inconsistencies in the
index. This requires an index scan which involves considerable I/O.
If the ALLOW READ ACCESS option is also specified
for the load terminate operation, the buffer pool is used for I/O.
- COPY YES or NO
- Use this parameter to specify whether a copy of the input data
is to be made during a load operation. COPY YES,
which is only applicable when forward recovery is enabled, reduces
load performance because all of the loading data is copied during
the load operation. The increased I/O activity might increase the
load time on an I/O-bound system. Specifying multiple devices or directories
(on different disks) can offset some of the performance penalty resulting
from this operation. COPY NO, which is only applicable
when forward recovery is enabled, does not affect load performance.
However, all table spaces related to the loaded table will be placed
in a Backup Pending state, and those table spaces must be backed up
before the table can be accessed.
- CPU_PARALLELISM
- Use this parameter to exploit the number of processes running
per database partition (if this is part of your machine's capability),
and significantly improve load performance. The parameter specifies
the number of processes or threads used by the load utility to parse,
convert, and format data records. The maximum number allowed is 30.
If there is insufficient memory to support the specified value, the
utility adjusts the value. If this parameter is not specified, the
load utility selects a default value that is based on the number of
CPUs on the system.
Record order in the source data is preserved
(see
Figure 1) regardless of the value
of this parameter, provided that:
- the anyorder file type modifier is not specified
- the PARTITIONING_DBPARTNUMS option (and more than one partition
is to be used for partitioning) is not specified
If tables include either LOB or LONG VARCHAR data, CPU_PARALLELISM is
set to 1. Parallelism is not supported in this case.
Although
use of this parameter is not restricted to symmetric multiprocessor
(SMP) hardware, you might not obtain any discernible performance benefit
from using it in non-SMP environments.
Figure 1. Record Order in the Source Data is Preserved When
the Number of Processes Running Per Database Partition is Exploited
During a Load Operation
- DATA BUFFER
- The DATA BUFFER parameter specifies the total
amount of memory, in 4 KB units, allocated to the load utility as
a buffer. It is recommended that this buffer be several extents in
size. The data buffer is allocated from the utility heap; however,
the data buffer can exceed the setting for the util_heap_sz database
configuration parameter as long as there is available memory in the
system.
- DISK_PARALLELISM
- The DISK_PARALLELISM parameter specifies the
number of processes or threads used by the load utility to write data
records to disk. Use this parameter to exploit available containers
when loading data, and significantly improve load performance. The
maximum number allowed is the greater of four times the CPU_PARALLELISM value
(actually used by the load utility), or 50. By default, DISK_PARALLELISM is
equal to the sum of the table space containers on all table spaces
containing objects for the table being loaded, except where this value
exceeds the maximum number allowed.
- NONRECOVERABLE
- If forward recovery is enabled, use this parameter if you do not
need to be able to recover load transactions against a table upon
rollforward. A NONRECOVERABLE load and a COPY
NO load have identical performance. However, there is a significant
difference in terms of potential data loss. A NONRECOVERABLE load
marks a table as not rollforward recoverable while leaving the table
fully accessible. This can create a problematic situation in which
if you need to rollforward through the load operation, then the loaded
data as well as all subsequent updates to the table will be lost.
A COPY NO load places all dependent table spaces
in the Backup Pending state which renders the table inaccessible until
a backup is performed. Because you are forced to take a backup after
that type of load, you will not risk losing the loaded data or subsequent
updates to the table. That is to say, a COPY NO load is totally recoverable.
Note: When
these load transactions are encountered during subsequent restore
and rollforward recovery operations, the table is not updated, and
is marked invalid. Further actions against
this table are ignored. After the rollforward operation is complete,
the table can only be dropped.
- SAVECOUNT
- Use this parameter to set an interval for the establishment of
consistency points during the load phase of a load operation.
The synchronization of activities performed to establish a consistency
point takes time. If done too frequently, there is a noticeable reduction
in load performance. If a very large number of rows is to be loaded,
it is recommended that a large SAVECOUNT value be
specified (for example, a value of 10 million in the case of a load
operation involving 100 million records).
A load restart operation
automatically continues from the last consistency point, provided
that the load restart operation resumes from the load phase.
- STATISTICS USE PROFILE
- Collect statistics specified in table statistics profile. Use
this parameter to collect data distribution and index statistics more
efficiently than through invocation of the RUNSTATS utility following
completion of the load operation, even though performance of the load
operation itself decreases (particularly when DETAILED INDEXES
ALL is specified).
For optimal performance, applications
require the best data distribution and index statistics possible.
Once the statistics are updated, applications can use new access paths
to the table data based on the latest statistics. New access paths
to a table can be created by rebinding the application packages using
the BIND command. The table statistics profile
is created by running the RUNSTATS command with
the SET PROFILE options.
When loading data
into large tables, it is recommended that a larger value for the stat_heap_sz (statistics
heap size) database configuration parameter be specified.
- USE <tablespace-name>
- When an ALLOW READ ACCESS load is taking place
and the indexing mode is REBUILD, this parameter
allows an index to be rebuilt in a system temporary table space and
copied back to the index table space during the index copy phase of
a load operation.
By default, the fully rebuilt index (also known
as the shadow index) is built in the same table space
as the original index. This might cause resource problems as both
the original and the shadow index reside in the same table space simultaneously.
If the shadow index is built in the same table space as the original
index, the original index is instantaneously replaced by the shadow.
However, if the shadow index is built in a system temporary table
space, the load operation requires an index copy phase which copies
the index from a system temporary table space to the index table space.
There is considerable I/O involved in the copy. If either of the table
spaces is a DMS table space, the I/O on the system temporary table
space might not be sequential. The values specified by the DISK_PARALLELISM option
are respected during the index copy phase.
- WARNINGCOUNT
- Use this parameter to specify the number of warnings that can
be returned by the utility before a load operation is forced to terminate.
Set the WARNINGCOUNT parameter to a relatively low
number if you are expecting only a few or no warnings. The load operation
stops after the WARNINGCOUNT number is reached. This
gives you the opportunity to correct problems before attempting to
complete the load operation.
File type modifiers
- ANYORDER
By default, the load utility preserves record order of source
data. When load is operating under an SMP environment, synchronization
between parallel processing is required to ensure that order is preserved.
In
an SMP environment, specifying the
anyorder file
type modifier instructs the load utility to not preserve the order,
which improves efficiency by avoiding the synchronization necessary
to preserve that order. However, if the data to be loaded is presorted,
anyorder might
corrupt the presorted order, and the benefits of presorting are lost
for subsequent queries.
Note: The anyorder file type
modifier has no effect if CPU_PARALLELISM is 1, and
it is not compatible with the SAVECOUNT option.
- BINARYNUMERICS, ZONEDDECIMAL, and PACKEDDECIMAL
For fixed length non-delimited ASCII (ASC) source data, representing
numeric data in binary can result in improved performance when loading.
If the packeddecimal file type modifier is specified,
decimal data is interpreted by the load utility to be in packed decimal
format (two digits per byte). If the zoneddecimal file
type modifier is specified, decimal data is interpreted by the load
utility to be in zoned decimal format (one digit per byte). For all
other numeric types, if the binarynumerics file type
modifier is specified, data is interpreted by the load utility to
be in binary format.
Note: - When the binarynumerics, packeddecimal,
or zoneddecimal file type modifiers are specified,
numeric data is interpreted in big-endian (high byte first) format,
regardless of platform.
- The packeddecimal and zoneddecimal file
type modifiers are mutually exclusive.
- The packeddecimal and zoneddecimal file
type modifiers only apply to the decimal target columns, and the binary
data must match the target column definitions.
- The reclen file type modifier must be specified
when the binarynumerics, packeddecimal,
or zoneddecimal file type modifiers are specified.
- FASTPARSE
Use with caution. In situations where the
data being loaded is known to be valid, it can be unnecessary to have
load perform the same amount of syntax checking as with more suspect
data. In fact, decreasing the scope of this step can improve load's
performance by about 10 or 20 percent. This can be done by using the fastparse file
type modifier, which reduces the data checking that is performed on
user-supplied column values from ASC and DEL files.
- NOROWWARNINGS
During a load operation, warning messages about rejected rows
are written to a specified file. However, if the load utility has
to process a large volume of rejected, invalid or truncated records,
it can adversely affect load's performance. In cases where many
warnings are anticipated, it is useful to use the norowwarnings file
type modifier to suppress the recording of these warnings.
- PAGEFREESPACE, INDEXFREESPACE, and TOTALFREESPACE
As data is inserted and updated in tables over time, the need
for table and index reorganization grows. One solution is to increase
the amount of free space for tables and indexes using pagefreespace, indexfreespace,
and totalfreespace. The first two modifiers, which
take precedence over the PCTFREE value, specify the percentage of
data and index pages that is to be left as free space, while totalfreespace specifies
the percentage of the total number of pages that is to be appended
to the table as free space.