Load overview

The load utility is capable of efficiently moving large quantities of data into newly created tables, or into tables that already contain data.

The utility can handle most data types, including XML, large objects (LOBs), and user-defined types (UDTs).

The load utility is faster than the import utility, because it writes formatted pages directly into the database, while the import utility performs SQL INSERTs.

The load utility does not fire triggers, and does not perform referential or table constraints checking (other than validating the uniqueness of the indexes).
The load process has several distinct phases (see Figure 1):
  1. Analyze
    When data is being loaded into a column-organized table, the first phase is the analyze phase, which is unique to column-organized tables. The analyze phase occurs only if a column compression dictionary needs to be built, which happens during a LOAD REPLACE operation, a LOAD REPLACE RESETDICTIONARY operation, a LOAD REPLACE RESETDICTIONARYONLY operation, or a LOAD INSERT operation (if the column-organized table is empty). For column-organized tables, this phase is followed by the load, build, and delete phases. The index copy phase applies to row-organized tables only.
  2. Load
    During the load phase, data is loaded into the table, and index keys and table statistics are collected, if necessary. Save points, or points of consistency, are established at intervals specified through the SAVECOUNT parameter in the LOAD command. Messages are generated, indicating how many input rows were successfully loaded at the time of the save point.
  3. Build
    During the build phase, indexes are produced based on the index keys collected during the load phase. The index keys are sorted during the load phase, and index statistics are collected (if the STATISTICS USE PROFILE option was specified, and profile indicates collecting index statistics). The statistics are similar to those collected through the RUNSTATS command.
  4. Delete
    During the delete phase, the rows that caused a unique or primary key violation are removed from the table. These deleted rows are stored in the load exception table, if one was specified.
  5. Index copy
    During the index copy phase, the index data is copied from a system temporary table space to the original table space. This will only occur if a system temporary table space was specified for index creation during a load operation with the READ ACCESS option specified.
Figure 1. Phases of the Load Process for Row-organized Tables
This graphic shows a time line for the phases of the load process for row-organized tables.
Note: After you invoke the load utility, you can use the LIST UTILITIES command to monitor the progress of the load operation.

The following information is required when loading data:

  • The path and the name of the input file, named pipe, or device.
  • The name or alias of the target table.
  • The format of the input source. This format can be DEL, ASC, PC/IXF, or CURSOR.
  • Whether the input data is to be appended to the table, or is to replace the existing data in the table.
  • A message file name, if the utility is invoked through the application programming interface (API), db2Load.
Load modes
  • INSERT
    In this mode, load appends input data to the table without making any changes to the existing data.
  • REPLACE
    In this mode, load deletes existing data from the table and populates it with the input data.
  • RESTART
    In this mode, an interrupted load is resumed. In most cases, the load is resumed from the phase it failed in. If that phase was the load phase, the load is resumed from the last successful consistency point.
  • TERMINATE
    In this mode, a failed load operation is rolled back.

The options you can specify include:

  • That the data to be loaded resides on the client, if the load utility is invoked from a remotely connected client. Note that XML and LOB data are always read from the server, even you specify the CLIENT option.
  • The method to use for loading the data: column location, column name, or relative column position.
  • How often the utility is to establish consistency points.
  • The names of the table columns into which the data is to be inserted.
  • Whether or not preexisting data in the table can be queried while the load operation is in progress.
  • Whether the load operation should wait for other utilities or applications to finish using the table or force the other applications off before proceeding.
  • An alternate system temporary table space in which to build the index.
  • The paths and the names of the input files in which LOBs are stored.
    Note: The load utility does not honor the COMPACT lob option.
  • A message file name. During load operations, you can specify that message files be created to contain the error, warning, and informational messages associated with those operations. Specify the name of these files with the MESSAGES parameter.
    Note:
    1. You can only view the contents of a message file after the operation is finished. If you want to view load messages while a load operation is running, you can use the LOAD QUERY command.
    2. Each message in a message file begins on a new line and contains information provided by the Db2® message retrieval facility.
  • Whether column values being loaded have implied decimal points.
  • Whether the utility should modify the amount of free space available after a table is loaded.
  • Whether statistics are to be gathered during the load process. This option is only supported if the load operation is running in REPLACE mode. Statistics are collected according to the profile defined for the table. The profile must be created by the RUNSTATS command before the LOAD command is executed. If the profile does not exist and the load operation is instructed to collect statistics according to the profile, the load will fail, and an error message will be returned.

    If data is appended to a table, statistics are not collected. To collect current statistics on an appended table, invoke the RUNSTATS utility following completion of the load process. If gathering statistics on a table with a unique index, and duplicate keys are deleted during the delete phase, statistics are not updated to account for the deleted records. If you expect to have a significant number of duplicate records, do not collect statistics during the load operation. Instead, invoke the RUNSTATS utility following completion of the load process.

  • Whether to keep a copy of the changes made. This is done to enable rollforward recovery of the database. This option is not supported if rollforward recovery is disabled for the database; that is, if the database configuration parameters logarchmeth1 and logarchmeth2 are set to OFF. If no copy is made, and rollforward recovery is enabled, the table space is left in Backup Pending state at the completion of the load operation.
    Logging is required for fully recoverable databases. The load utility almost completely eliminates the logging associated with the loading of data. In place of logging, you have the option of making a copy of the loaded portion of the table. If you have a database environment that allows for database recovery following a failure, you can do one of the following:
    • Explicitly request that a copy of the loaded portion of the table be made.
    • Take a backup of the table spaces in which the table resides immediately after the completion of the load operation.

    If the database configuration parameter logindexbuild is set, and if the load operation is invoked with the COPY YES recoverability option and the INCREMENTAL indexing option, the load logs all index modifications. The benefit of using these options is that when you roll forward through the log records for this load, you also recover the indexes (whereas normally the indexes are not recovered unless the load uses the REBUILD indexing mode).

    If you are loading a table that already contains data, and the database is non-recoverable, ensure that you have a backed-up copy of the database, or the table spaces for the table being loaded, before invoking the load utility, so that you can recover from errors.

    If you want to perform a sequence of multiple load operations on a recoverable database, the sequence of operations will be faster if you specify that each load operation is non-recoverable, and take a backup at the end of the load sequence, than if you invoke each of the load operations with the COPY YES option. You can use the NONRECOVERABLE option to specify that a load transaction is to be marked as non-recoverable, and that it will not be possible to recover it by a subsequent rollforward operation. The rollforward utility will skip the transaction, and will mark the table into which data was being loaded as "invalid". The utility will also ignore any subsequent transactions against that table. After the rollforward operation is completed, such a table can only be dropped (see Figure 2). With this option, table spaces are not put in backup pending state following the load operation, and a copy of the loaded data does not have to be made during the load operation.

    Figure 2. Non-recoverable Processing During a Roll Forward Operation
    This graphic shows a time line for non-recoverable processing during a roll forward operation.
  • The fully qualified path to be used when creating temporary files during a load operation. The name is specified by the TEMPFILES PATH parameter of the LOAD command. The default value is the database path. The path resides on the server machine, and is accessed by the Db2 instance exclusively. Therefore, any path name qualification given to this parameter must reflect the directory structure of the server, not the client, and the Db2 instance owner must have read and write permission on the path.