Data sets that LOAD uses

The LOAD utility requires certain data sets depending on the options that you specify. Include statements in your JCL for each data set that is needed. Alternatively, for some of these data sets, you can use the TEMPLATE utility to dynamically allocate them.

Table 1. Data sets that LOAD uses
Data set Default DD name LOAD option to specify a different DD name or template name Description Required?
In-stream data set SYSIN None An input data stream that contains the utility control statement. Yes
Output data set SYSPRINT None An output data set for utility messages. Yes
Input data set SYSREC INDDN An input data set that contains the data that is to be loaded.

You can specify more than one input data set with the INDDN option.

Yes

The only LOAD situation where this data set is not required is if you choose to use a cursor instead. In this case, specify the INCURSOR option.

Error data set SYSERR ERRDDN A work data set for error processing. Yes
Discard data set4 SYSDISC DISCARDDN A work data set that contains copies of records that are not loaded.

If the input data set is on tape, and you use a template for the discard data set, include the SPACE parameter in the TEMPLATE utility control statement.

If you omit the DD statement for this data set, LOAD creates the data set with the same record format, record length, and block size as the input data set.

Yes
Sort message data set UTPRINT None An output data set that contains messages from the sort program. These messages are usually routed to SYSOUT or DUMMY. Yes
Copy data sets SYSCOPY COPYDDN and RECOVERYDDN One to four output data sets that contain image copy data sets. Required for inline copies.
FlashCopy® image copies None FCCOPYDDN A VSAM data set for the output FlashCopy image copy of each partition or piece. Required for inline FlashCopy image copies (FLASHCOPY YES or FLASHCOPY CONSISTENT is specified).
Sort message data set RNPRINnn None An output data set that contains messages from the sort program. These messages are usually routed to SYSOUT or DUMMY.

nn is a number from 01 to the number of parallel subtasks.

Required when collecting distribution statistics for column groups (COLGROUP and FREQVAL are specified).
Sort message data set STPRIN01 None An output data set that contains messages from the sort program. These messages are usually routed to SYSOUT or DUMMY.

LOAD dynamically allocates the STPRIN01 data set if UTPRINT is allocated to SYSOUT.

Required when collecting distribution statistics for column groups (COLGROUP and FREQVAL are specified) or when statistics are being collected on at least one data-partitioned secondary index.
Mapping data set SYSMAP MAPDDN A work data set for mapping the identifier of a table row to the input record that caused an error. Required for the following situations:
  • If referential constraints exist and ENFORCE(CONSTRAINTS) is specified. (This option is the default.)
  • For discard processing when loading one or more tables that have unique indexes.
Sort input data set 4 SYSUT1 WORKDDN A temporary work data set for sort input.

If the input data set is on tape, and you use a templates for the SYSUT1 data set, include the SPACE parameter in the TEMPLATE utility control statements.

Required for any of the following situations:
  • Referential constraints exist and ENFORCE(CONSTRAINTS) is specified or accepted as the default.
  • Indexes exist.
  • DISCARDS is specified and ddname1 is specified for WORKDDN
Sort output data set3 SORTOUT WORKDDN A temporary work data set for sort output. Required for any of the following situations:
  • Referential constraints exist and ENFORCE(CONSTRAINTS) is specified or accepted as the default.
  • Indexes exist.
  • DISCARDS is specified and ddname2 is specified for WORKDDN
Sort work data sets1, 2, 3 SWnnWKmm or SORTWKnn None Temporary data sets for sort input and output when sorting keys.

If index build parallelism is used, the DD names have the form SWnnWKmm. If index build parallelism is not used, the DD names have the form SORTWKnn.

Required if any indexes are to be built or if a sort is required for processing errors.
Sort work data sets1, 2, 3 ST01WKnn None Temporary data sets for sort input and output when collecting inline statistics on at least one data-partitioned secondary index or when the COLGROUP option is specified. Required when collecting inline statistics on at least one data-partitioned secondary index.
Sort work data sets1, 2, 3 RNmmWKnn None Temporary data sets for sort input and output when collecting distribution statistics for column groups.

The DD names have the form RNmmWKnn, where mm is the subtask number, and nn is a sequence number for the data set that is allocated for that subtask.

Required when collecting inline statistics on at least one data-partitioned secondary index.
Sort work data sets1, 2, 3 SORTWK01 None Temporary data sets for sort input and output when collecting frequency statistics. Required when collecting frequency statistics.
Note:
  1. If the DYNALLOC parm of the sort program is not turned on, you need to allocate the data set. Otherwise, the sort program dynamically allocates the temporary data set.
  2. Recommendation: Use dynamic allocation by specifying the SORTDEVT option in the LOAD statement so that you do not have to write and maintain these DD statements.
  3. Tape devices are not supported for work data sets that are used for sort purposes.
  4. Start of changeTape data sets managed by DFSMSrmm do not require the SPACE parameter if NUMRECS and SORTKEYS are not specified.End of change

Defining work data sets

Use the formulas and instructions in The following table to calculate the size of work data sets for LOAD. Each row in the table lists the DD name that is used to identify the data set and either formulas or instructions that you should use to determine the size of the data set. The key for the formulas is located at the bottom of the table.

Table 2. Size of work data sets for LOAD jobs
Work data set Size
SORTOUT
  • Simple table space:
    • max(k,e)
  • Partitioned or segmented (non-UTS) table space:
    • max(k,e,m)
    If you specify an estimate of the number of keys with the SORTKEYS option:
    • max(f,e) for a simple table space
    • max(f,e,m) for a partitioned or segmented (non-UTS) table space
ST01WKnn
  • 2 ×(maximum record length × numcols × (count + 2) × number of indexes)
SYSDISC Same size as input data set
SYSERR e
SYSMAP
  • Simple table space for discard processing:
    • m
  • Partitioned or segmented (non-UTS) table space without discard processing:
    • max(m,e)
SYSUT1
  • Simple table space:
    • max(k,e)
  • Partitioned or segmented (non-UTS) table space:
    • max(k,e,m)
    If you specify an estimate of the number of keys with the SORTKEYS option:
    • max(f,e) for a simple table space
    • max(f,e,m) for a partitioned or segmented (non-UTS) table space
Note:
variable
meaning
k
Key calculation
f
Foreign key calculation
m
Map calculation
e
Error calculation
max()
Maximum value of the specified calculations
numcols
Number of key columns to concatenate when you collect frequent values from the specified index
count
Number of frequent values that Db2 is to collect
maximum record length
Maximum record length of the SYSCOLDISTSTATS record that is processed when collecting frequency statistics (You can obtain this value from the RECLENGTH column in SYSTABLES.)
  • Calculating the key: k

    If a mix of data-partitioned secondary indexes and nonpartitioned indexes exists on the table that is being loaded or a foreign key exists that is exactly indexed by a data-partitioned secondary index, use this formula:

    max(longest index key + 17, longest foreign key + 17) * (number of extracted keys)

    Otherwise, use this formula:

    max(longest index key + 15, longest foreign key + 15) * (number of extracted keys)

    For nonpadded indexes, the length of the longest key means the maximum possible length of a key with all varying-length columns padded to their maximum lengths, plus 2 bytes for each varying-length column.

  • Calculating the number of extracted keys:
    1. Count 1 for each index.
    2. Count 1 for each foreign key that is not exactly indexed (that is, where foreign key and index definitions do not correspond identically).
    3. For each foreign key that is exactly indexed (that is, where foreign key and index definitions correspond identically):
      1. Count 0 for the first relationship in which the foreign key participates if the index is not a data-partitioned secondary index. Count 1 if the index is a data-partitioned secondary index.
      2. Count 1 for subsequent relationships in which the foreign key participates (if any).
    4. Multiply count by the number of rows that are to be loaded.
  • Calculating the foreign key: f

    If a mix of data-partitioned secondary indexes and nonpartitioned indexes exists on the table that is being loaded or a foreign key exists that is exactly indexed by a data-partitioned secondary index, use this formula:

    max(longest foreign key + 17) * (number of extracted keys)

    Otherwise, use this formula:

    max(longest foreign key + 15) * (number of extracted keys)
  • Calculating the map: m

    The data set must be large enough to accommodate one map entry (length = 23bytes) per table row that is produced by the LOAD job.

  • Calculating the error: e

    The data set must be large enough to accommodate one error entry (length = 568 bytes) per defect that is detected by LOAD (for example, conversion errors, unique index violations, violations of referential constraints).

  • Calculating the number of possible defects:
    • For discard processing, if the discard limit is specified, the number of possible defects is equal to the discard limit.

      If the discard limit is the maximum, calculate the number of possible defects by using the following formula:

      number of input records +
      (number of unique indexes * number of extracted keys) +
      (number of relationships * number of extracted foreign keys)
    • For nondiscard processing, the data set is not required.

Allocating twice the space that is used by the input data sets is usually adequate for the sort work data sets. Two or three large SORTWKnn data sets are preferable to several small ones.

Sort work data sets cannot span volumes. Smaller volumes require more sort work data sets to sort the same amount of data; therefore, large volume sizes can reduce the number of needed sort work data sets. It is recommended that at least 1.2 times the amount of data to be sorted be provided in sort work data sets on disk.