Specifying partitioning or collecting methods (DataStage)

You can specify how the data is collected or partitioned before it is processed.

Partitioning data

About this task

If the stage is running in parallel mode, it processes the data in partitions. By default, the partitioning method is set to Auto. You can override the default behavior.

Procedure

  1. Open the Partitioning tab of the Input page.
  2. Select a partitioning method from the list:
    Option Description
    (Auto) IBM® DataStage® attempts to work out the best partitioning method depending on execution modes of current and preceding stages and how many nodes are specified in the Configuration file. This is the default partitioning method for most stages.
    Db2 Replicates the Db2 partitioning method of a specific Db2 table. Requires extra properties to be set. Access these properties by clicking the properties button.
    Entire Each file written to receives the entire data set.
    Hash The records are hashed into partitions based on the value of a key column or columns selected from the Available list.
    Modulus The records are partitioned using a modulus function on the key column selected from the Available list. This is commonly used to partition on tag fields.
    Random The records are partitioned randomly, based on the output of a random number generator.
    Round Robin The records are partitioned on a round robin basis as they enter the stage.
    Same Preserves the partitioning already in place.
    Range Divides a data set into approximately equal size partitions based on one or more partitioning keys. Range partitioning is often a preprocessing step to performing a total sort on a data set. Requires extra properties to be set. Access these properties by clicking the properties button.
  3. If you selected the hash or modulus partitioning methods, specify a key by clicking on one or more of the columns in the Available list. The selected column or columns appear in the Selected list.

Collecting data

You can specify a collecting method.

About this task

If the stage runs sequentially, and the previous stage in the job runs in parallel, then the data is collected before being written. By default, the collecting method is set to Auto. You can override the default behavior.

Procedure

  1. Open the Partitioning tab of the Input page.
  2. Select a collecting method from the list:
    Option Description
    (Auto) This is the default collection method for the Sequential File stage. Normally, when you are using Auto mode, IBM DataStage will read any row from any input partition as it becomes available.
    Ordered Reads all rows from the first partition, then all rows from the second partition, and so on.
    Round Robin Reads a row from the first input partition, then from the second partition, and so on. After reaching the last partition, the operator starts over.
    Sort Merge Reads rows in an order based on one or more columns of the row. This requires you to select a collecting key column from the Available list.
  3. If you selected the Sort Merge collecting method, specify a collecting key by clicking on one or more of the columns in the Available list. The selected column or columns appear in the Selected list.