The load utility is capable of efficiently
moving large quantities of data into newly created tables, or into
tables that already contain data.
The utility can handle
most data types, including XML, large objects (LOBs), and user-defined
types (UDTs). The load utility is faster than the import
utility, because it writes formatted pages directly into the database,
while the import utility performs SQL INSERTs.
The load
utility does not fire triggers, and does not perform referential or
table constraints checking (other than validating the uniqueness of
the indexes).
The load process has several distinct phases (see
Figure 1):
- Analyze
When data is being loaded into a column-organized table,
the first phase is the analyze phase, which is unique to column-organized tables.
The analyze phase occurs only if a column compression dictionary needs
to be built, which happens during a LOAD REPLACE operation,
a LOAD REPLACE RESETDICTIONARY operation, a LOAD
REPLACE RESETDICTIONARYONLY operation, or a LOAD
INSERT operation (if the column-organized table is empty).
For column-organized tables,
this phase is followed by the load, build, and delete phases. The
index copy phase applies to row-organized tables
only.
- Load
During the load phase, data is loaded
into the table, and index keys and table statistics are collected,
if necessary. Save points, or points of consistency,
are established at intervals specified through the SAVECOUNT parameter
in the LOAD command. Messages are generated, indicating
how many input rows were successfully loaded at the time of the save
point.
- Build
During the build phase, indexes are
produced based on the index keys collected during the load phase.
The index keys are sorted during the load phase, and index statistics
are collected (if the STATISTICS USE PROFILE option
was specified, and profile indicates collecting index statistics).
The statistics are similar to those collected through the RUNSTATS command.
- Delete
During the delete phase, the rows that
caused a unique or primary key violation are removed from the table.
These deleted rows are stored in the load exception table, if one
was specified.
- Index copy
During the index copy phase, the
index data is copied from a system temporary table space to the original
table space. This will only occur if a system temporary table space
was specified for index creation during a load operation with the READ
ACCESS option specified.
Figure 1. Phases of the Load Process
for Row-organized Tables
Note: After you invoke the load utility, you can use the LIST
UTILITIES command to monitor the progress of the load operation.
The following information is required when loading data:
- The path and the name of the input file, named pipe, or device.
- The name or alias of the target table.
- The format of the input source. This format can be DEL, ASC, PC/IXF,
or CURSOR.
- Whether the input data is to be appended to the table, or is to
replace the existing data in the table.
- A message file name, if the utility is invoked through the application
programming interface (API), db2Load.
Load modes
- INSERT
In this mode, load appends input data
to the table without making any changes to the existing data.
- REPLACE
In this mode, load deletes existing
data from the table and populates it with the input data.
- RESTART
In this mode, an interrupted load is
resumed. In most cases, the load is resumed from the phase it failed
in. If that phase was the load phase, the load is resumed from the
last successful consistency point.
- TERMINATE
In this mode, a failed load operation
is rolled back.
The options you can specify include:
- That the data to be loaded resides on the
client, if the load utility is invoked from a remotely connected client.
Note that XML and LOB data are always read from the server, even you
specify the CLIENT option.
- The method to use for loading the data: column location, column
name, or relative column position.
- How often the utility is to establish consistency points.
- The names of the table columns into which the data is to be inserted.
- Whether or not preexisting data in the table can be queried while
the load operation is in progress.
- Whether the load operation should wait for other utilities or
applications to finish using the table or force the other applications
off before proceeding.
- An alternate system temporary table space in which to build the
index.
- The paths and the names of the input files in which LOBs are stored.
Note: The
load utility does not honor the COMPACT lob option.
- A message file name. During load operations, you can specify that
message files be created to contain the error, warning, and informational
messages associated with those operations. Specify the name of these
files with the MESSAGES parameter.
Note: - You can only view the contents of a message file after the operation
is finished. If you want to view load messages while a load operation
is running, you can use the LOAD QUERY command.
- Each message in a message file begins on a new line and contains
information provided by the DB2® message
retrieval facility.
- Whether column values being loaded have implied decimal points.
- Whether the utility should modify the amount of free space available
after a table is loaded.
- Whether statistics are to be gathered during the load process.
This option is only supported if the load operation is running in REPLACE mode.
Statistics are collected according to the profile defined for the
table. The profile must be created by the RUNSTATS command
before the LOAD command is executed. If the profile
does not exist and the load operation is instructed to collect statistics
according to the profile, the load will fail, and an error message
will be returned.
If data is appended to a table, statistics are
not collected. To collect current statistics on an appended table,
invoke the RUNSTATS utility following completion
of the load process. If gathering statistics on a table with a unique
index, and duplicate keys are deleted during the delete phase, statistics
are not updated to account for the deleted records. If you expect
to have a significant number of duplicate records, do not collect
statistics during the load operation. Instead, invoke the RUNSTATS utility
following completion of the load process.
- Whether
to keep a copy of the changes made. This is done to enable rollforward
recovery of the database. This option is not supported if rollforward
recovery is disabled for the database; that is, if the database configuration
parameters logarchmeth1 and logarchmeth2 are
set to OFF. If no copy is made, and rollforward recovery
is enabled, the table space is left in Backup Pending state at the
completion of the load operation.
Logging is required for fully
recoverable databases. The load utility almost completely eliminates
the logging associated with the loading of data. In place of logging,
you have the option of making a copy of the loaded portion of the
table. If you have a database environment that allows for database
recovery following a failure, you can do one of the following:
- Explicitly request that a copy of the loaded portion of the table
be made.
- Take a backup of the table spaces in which the table resides immediately
after the completion of the load operation.
If the database configuration parameter logindexbuild is
set, and if the load operation is invoked with the COPY
YES recoverability option and the INCREMENTAL indexing
option, the load logs all index modifications. The benefit of using
these options is that when you roll forward through the log records
for this load, you also recover the indexes (whereas normally the
indexes are not recovered unless the load uses the REBUILD indexing
mode).
If you are loading a table that already contains data,
and the database is non-recoverable, ensure that you have a backed-up
copy of the database, or the table spaces for the table being loaded,
before invoking the load utility, so that you can recover from errors.
If
you want to perform a sequence of multiple load operations on a recoverable
database, the sequence of operations will be faster if you specify
that each load operation is non-recoverable, and take a backup at
the end of the load sequence, than if you invoke each of the load
operations with the COPY YES option. You can
use the NONRECOVERABLE option to specify that a load
transaction is to be marked as non-recoverable, and that it will not
be possible to recover it by a subsequent rollforward operation. The
rollforward utility will skip the transaction, and will mark the table
into which data was being loaded as "invalid". The utility will also
ignore any subsequent transactions against that table. After the rollforward
operation is completed, such a table can only be dropped (see Figure 2). With this option, table spaces are
not put in backup pending state following the load operation, and
a copy of the loaded data does not have to be made during the load
operation.
Figure 2. Non-recoverable Processing During
a Roll Forward Operation
- The fully qualified path to be used when creating temporary files
during a load operation. The name is specified by the TEMPFILES
PATH parameter of the LOAD command.
The default value is the database path. The path resides on the server
machine, and is accessed by the DB2 instance
exclusively. Therefore, any path name qualification given to this
parameter must reflect the directory structure of the server, not
the client, and the DB2 instance
owner must have read and write permission on the path.