You can use the LOAD utility to load one or more partitions of a partitioned table space. To improve performance when loading more than one partition, consider enabling partition parallelism.
About this task
Partition parallelism can reduce the elapsed time that is required for loading large amounts of data into partitioned table spaces.
If you are loading a partitioned table space that is created with DEFINE NO, the load operation might take longer. If a partitioned table space is created with DEFINE NO, all partitions are also implicitly defined with DEFINE NO. The first data row that is inserted by the LOAD utility defines all data sets in the partitioned table space. If this process takes a long time, expect timeouts on the database descriptor (DBD).
To load partitions:
If you want to load only certain partitions of a partitioned table, use the PART
clause of the INTO TABLE option. If you omit the PART clause, the entire table is loaded.
Restriction: The following restrictions exist for identity columns:
- When index-based partitioning is used, LOAD INTO TABLE PART integer is not allowed if an identity column is part of the partitioning index.
- When table-based partitioning is used, LOAD INTO TABLE PART integer is not allowed if an identity column is used in a partitioning clause of the CREATE TABLE or ALTER TABLE statement.
To override these restrictions, specify the OVERRIDE(IDENTITY) option in the LOAD statement.
- If you want partitions to be processed in parallel, take
one of the following actions:
you have a single input data set and the partitioned table space is partitioned (non-UTS) or
partition-by-range , specify the PARALLEL keyword. This keyword enables LOAD to use multiple
parallel subtasks. When determining the degree of parallelism to specify on the PARALLEL keyword,
consider that a high degree of parallelism can result in increased processor time. Recommendation: Specify PARALLEL(0) or PARALLEL without a number so that Db2® can determine the optimal degree of parallelism.
- If one or more
nonpartitioned secondary indexes exists on the partitioned table space, and you have a separate
input data set for each partition, use load partition parallelism. Partition parallelism loads all
partitions in a single job. To invoke partition parallelism, for each partition that you want to
load, specify the INTO TABLE PART clause with one of the following keywords:
- DISCARDDN if DISCARDS n is specified
If the table space is created with DEFINE NO, specifying SHRLEVEL CHANGE on your LOAD statement and enabling partition parallelism is equivalent to concurrent, independent insert jobs. For example, in a large partitioned table space that is created with DEFINE NO, the LOAD utility starts three tasks. The first task tries to insert the first row, which causes an update to the DBD. The other two tasks time out while they wait to access the DBD. The first task holds the lock on the DBD while the data sets are defined for the table space.
- If the only indexes are the partitioned indexes, use multiple jobs to run LOAD concurrently against separate partitions. This method also requires that you have a separate input data set for each partition.
- If you have a single input data set and the partitioned table space is partitioned (non-UTS) or partition-by-range , specify the PARALLEL keyword. This keyword enables LOAD to use multiple parallel subtasks. When determining the degree of parallelism to specify on the PARALLEL keyword, consider that a high degree of parallelism can result in increased processor time.
- If you use the INTO TABLE PART clause, take the following
actions as appropriate:
- If you specify the REPLACE or RESUME options, specify them separately by partition. If you specify these options before the INTO TABLE PART clause, LOAD serializes the load operation for the entire table space and does not process the partitions concurrently.
- To load columns in an order that is different than the order of the columns in the CREATE TABLE statement, code field specifications for each INTO TABLE PART clause.
- Make sure that you specify the LOAD options in the appropriate place in the utility statement. Some LOAD options, such as INDDN, DISCARDDN, RESUME, and REPLACE, can have a different scope depending on their placement in the LOAD statement. For example, if you specify INDDN before INTO TABLE, the specified input data set is used to load the entire table. However, if you specify INDDN after INTO TABLE, in a PART clause, the specified input data set is used to load only the specified partition.
If you want Db2 to save copies of
records that are not loaded, use the DISCARDDN option to specify a discard data set as
- If you want a single discard data set for all partitions, specify DISCARDDN before INTO TABLE. If you use a template for this data set, and the TEMPLATE utility statement contains the variable $PA. or $PART., that variable is substituted with 0 or the partition number in the first INTO TABLE PART clause.
- If you want one discard data set for a partition, specify DISCARDDN in an INTO TABLE PART clause. If you use a template for this data set, and the TEMPLATE utility statement contains the variable $PA. or $PART., that variable is substituted with the partition number. If you specify DISCARDDN in more than one INTO TABLE PART clause, you must ensure that the data set names are unique. One way to ensure unique names is to use templates and the variable $PA. or $PART. in the TEMPLATE statement.
- Example of loading certain records into certain partitions
- The control statement in the following example specifies that Db2 is to load data into the first and second
partitions of the employee table. Records with '0' in column 1 replace the contents of partition 1;
records with '1' in column 1 are added to partition 2; all other records are ignored. This example
control statement, which is simplified to illustrate the point, does not list field specifications
for all columns of the
LOAD DATA CONTINUEIF(72:72)='X' INTO TABLE DSN8B10.EMP PART 1 REPLACE WHEN (1) = '0' ( EMPNO POSITION (1:6) CHAR(6), FIRSTNME POSITION (7:18) CHAR(12), ⋮ ) INTO TABLE DSN8B10.EMP PART 2 RESUME YES WHEN (1) = '1' ( EMPNO POSITION (1:6) CHAR(6), FIRSTNME POSITION (7:18) CHAR(12), ⋮ )
- Example of loading partitions from separate input data sets
- The following example LOAD statements
specify that partitions 1 and 2 of the EMP table are to be loaded from the EMPLDS1 and EMPLDS2 data
sets. This example assumes that your data is in separate input data sets and already sorted by
partition. Therefore, you do not need to use the WHEN clause of INTO TABLE. Placing the RESUME YES
option before the PART option inhibits concurrent partition processing. If you want LOAD to process
other partitions concurrently, specify the RESUME option after the PART
LOAD DATA INDDN EMPLDS1 CONTINUEIF(72:72)='X' RESUME YES INTO TABLE DSN8B10.EMP REPLACE PART 1 LOAD DATA INDDN EMPLDS2 CONTINUEIF(72:72)='X' RESUME YES INTO TABLE DSN8B10.EMP REPLACE PART 2
- Example of loading partitions independently
the following example, partition 1 and partition 2 are loaded concurrently.
LOAD DATA INDDN SYSREC LOG NO INTO TABLE DSN8B10.EMP PART 1 REPLACE
LOAD DATA INDDN SYSREC2 LOG NO INTO TABLE DSN8B10.EMP PART 2 REPLACE
- Example of specifying separate discard data sets for each partition
- If you specify multiple discard data sets, you must ensure that the data set names are unique.
In the following example, the LOAD statement specifies that two discard data sets are to be
allocated, one for each partition. Those discard data sets both use the DT template. The TEMPLATE
statement for DT includes the &PA. variable (partition number) to ensure that the data set name
is different for each partition.
TEMPLATE DT UNIT(SYSDA) DSN(JUOSU339.&TS..P&PA..DISCARD) SPACE(50,10) TRK LOAD DATA INTO TABLE DSN8B10.EMP PART 1 INDDN D1 DISCARDDN(DT) REPLACE NO RESUME INTO TABLE DSN8B10.EMP PART 2 INDDN D2 DISCARDDN(DT) REPLACE NO RESUME