Input Partitioning tab

Most parallel stages have a default partitioning or collecting method associated with them.

The Partitioning tab can be used depending on the execution mode of the stage (parallel or sequential) and the execution mode of the immediately preceding stage in the job. For example, if the preceding stage is processing data sequentially and the current stage is processing in parallel, the data is partitioned before it enters the current stage. Conversely, if the preceding stage is processing data in parallel and the current stage is sequential, the data is collected as it enters the current stage.

You can override the default partitioning or collecting method on the Partitioning tab. The selected method is applied to the incoming data as it enters the stage on a particular link, and so the Partitioning tab appears on the Input page. You can also use the tab to re-partition data between two parallel stages. If both stages are executing sequentially, you cannot select a partition or collection method and the fields are disabled. The fields are also disabled if the particular stage does not permit selection of partitioning or collection methods. The following table shows what can be set from the Partitioning tab in what circumstances:

Preceding Stage Current Stage Partition Tab Mode
Parallel Parallel Partition
Parallel Sequential Collect
Sequential Parallel Partition
Sequential Sequential None (disabled)

Use the Partitioning tab to specify whether the data should be sorted as it enters.

The Partitioning tab has the following fields:

  • Partition type. Choose the partitioning (or collecting) type from the drop-down list. The following partitioning types are available:
    • (Auto). InfoSphere® DataStage® attempts to work out the best partitioning method depending on execution modes of current and preceding stages and how many nodes are specified in the Configuration file. This is the default method for many stages.
    • Entire. Every processing node receives the entire data set. No further information is required.
    • Hash. The records are hashed into partitions based on the value of a key column or columns selected from the Available list.
    • Modulus. The records are partitioned using a modulus function on the key column selected from the Available list. This is commonly used to partition on tag fields.
    • Random. The records are partitioned randomly, based on the output of a random number generator. No further information is required.
    • Round Robin. The records are partitioned on a round robin basis as they enter the stage. No further information is required.
    • Same. Preserves the partitioning already in place. No further information is required.
    • DB2. Replicates the DB2 partitioning method of a specific DB2 table. Requires extra properties to be set. Access these properties by clicking the properties button.
    • Range. Divides a data set into approximately equal size partitions based on one or more partitioning keys. Range partitioning is often a preprocessing step to performing a total sort on a data set. Requires extra properties to be set. Access these properties by clicking the properties button.

      The following collection types are available:

    • (Auto). Normally, when you use Auto mode, InfoSphere DataStage eagerly reads any row from any input partition as it becomes available. This fastest collecting method is the default collection method for many stages. In some circumstances InfoSphere DataStage detect further requirements for collected data, for example, it might need to be sorted. Using Auto mode sorts data, if required.
    • Ordered. Reads all records from the first partition, then all records from the second partition, and so on. Requires no further information.
    • Round Robin. Reads a record from the first input partition, then from the second partition, and so on. After reaching the last partition, the operator starts over.
    • Sort Merge. Reads records in an order based on one or more columns of the record. This requires you to select a collecting key column from the Available list.
  • Available. This lists the input columns for the input link. Key columns are identified by a key icon. For partitioning or collecting methods that require you to select columns, you click on the required column in the list and it appears in the Selected list to the right. This list is also used to select columns on which to sort.
  • Selected. This list shows which columns have been selected for partitioning, collecting, or sorting and displaying information about them. The available information is whether a sort is being performed (indicated by an arrow), if so the order of the sort (ascending or descending) and collating sequence (sort as EBCDIC), and whether an alphanumeric key is case sensitive or not. Nullable columns are marked to indicate if null columns take first or last position. You can select sort order, case sensitivity, collating sequence, and null positions from the shortcut menu. If applicable, the Usage field indicates whether a particular key column is being used for sorting, partitioning, or both.
  • Sorting. The check boxes in the section allow you to specify sort details. The availability of sorting depends on the partitioning method chosen.
    • Perform Sort. Select this to specify that data coming in on the link should be sorted. Select the column or columns to sort on from the Available list.
    • Stable. Select this if you want to preserve previously sorted data sets. The default is stable.
    • Unique. Select this to specify that, if multiple records have identical sorting key values, only one record is retained. If stable sort is also set, the first record is retained.

      You can also specify sort direction, case sensitivity, whether sorted as EBCDIC, and whether null columns will appear first or last for each column. Where you are using a keyed partitioning method, you can also specify whether the column is used as a key for sorting, for partitioning, or for both. Select the column in the Selected list and right-click to invoke the shortcut menu. The availability of the sort options depends on the type of data in the column, whether it is nullable or not, and the partitioning method chosen.

      If you have NLS enabled, the sorting box has an additional button. Click this to open the NLS Locales tab of the Sort Properties window. This lets you view the current default collate convention, and select a different one for the stage if required. You can also use a job parameter to specify the locale, or browse for a file that defines custom collate rules. The collate convention defines the order in which characters are collated, for example, the character Ä follows A in Germany, but follows Z in Sweden. Select a locale from the list, or click the arrow button next to the list to use a job parameter or browse for a collate file.

      If you require a more complex sort operation, you should use the Sort stage.