IBM InfoSphere DataStage and InfoSphere QualityStage, Version 8.5

Partitioning tab (input)

Use the Partitioning tab to specify details about how the stage partitions or collects data on the current link before it processes the data or writes it to a data target.

You can also use the Partitioning tab to specify that data arriving on the input link should be sorted before being processed or written to the data target. The availability of sorting depends on the partitioning or collecting method chosen. It is not available with the Auto methods. The Partitioning tab provides basic sorting facilities. For a more complex sort operation, use the Sort stage.

The Partitioning tab contains the following controls and fields:

Partitioning/Collecting

Choose the partitioning or collecting type from the list.

The Partition type list is available if the stage is set to run in parallel mode. If you select a method from the list, the method overrides any current partitioning method.

The Collection type list is available if the stage is set to run in sequential mode, and the preceding stage is set to run in parallel mode. If you select a method from the list, the method overrides the default collection method of Auto.

The following partitioning types are available:

(Auto)

At run time the engine attempts to work out the best partitioning method depending on:

Whether the current and preceding stages are set to run in sequential mode or in parallel mode.
Whether previous stages in the job have the Preserve Partitioning option set to indicate that partitioning should be preserved.
How many nodes are specified in the configuration file.

Auto is the default method for many stages, but Auto is not available for the Lookup File Set stage or the DB2 Enterprise stage.

Entire: Every processing node receives the entire data set.

Hash: The rows are hashed into partitions based on the value of one or more key columns selected from the Available list.

Modulus: The rows are partitioned using a modulus function on the key column selected from the Available list.

Random: The rows are partitioned randomly, based on the output of a random number generator.

Round Robin: The rows are partitioned on a round robin basis as they enter the stage.

Same: This method preserves the current data partitions.

DB2: This method replicates the DB2 partitioning method of a DB2 table. This is the default method for the DB2 Enterprise stage.

Range: This method divides a data set into approximately equal size partitions based on one or more partitioning keys. Range partitioning is often used as a preparatory step for performing a total sort on a data set. The range method requires you to specify the name of a sample range map (you create the range map using the Write Range Map stage). To specify the range map, click the properties button and enter or browse for the range map name in the window.

The following collection types are available:

(Auto): The Auto method usually causes the stage to read any row from any input partition as the row becomes available and is the fastest collecting method. The stage can, however, use a different collecting method when Auto is set in some circumstances. For example, if the stage requires data to be sorted before it can operate, the stage will sort the data.

Ordered: This method reads all the rows from the first partition, then all the rows from the second partition, and so on.

Round Robin: This method reads a row from the first input partition, then a row from the second partition, and so on. After reaching the last partition, the stage starts again from the first partition.

Sort Merge: This method reads rows in an order based on one or more columns of the row. The method requires you to select one or more collecting key columns from the Available list.

Sorting

Use the controls in this section to specify data should be sorted. Data is always sorted within data partitions. If the stage is partitioning incoming data, the data is sorted after the partitioning. If the stage is collecting incoming data, the data is sorted before it is collected.

Sort: Select Sort to specify that data coming in on the link should be sorted. Select the key column or columns on which to sort from the Available list.

Stable: Select Stable if you want to preserve previously sorted data sets. Stable is set by default.

Unique: Select Unique to specify that, if multiple records have identical sorting key values, only one record is retained. If stable sort is also set, the first record is retained.

To specify sort direction, case sensitivity, and collating sequence for each column in the Selected list, select the column, right-click to open the shortcut menu and select the required items.

If national language support is enabled you can specify the collation locale for the sort operation. The collation locale specifies precedence rules appropriate to the selected locale. Click the properties button Shows the properties button

to open the Sort Properties window and choose a collation locale from the list.

Use the Available and Selected lists to specify which columns are key columns and which columns should be sorted on.

Available: The Available list contains the input columns for the input link. Key columns are identified by a key icon. If you select a partitioning or collecting method that requires you to select one or more columns, click the required column in the Available list to move it to the Selected list. The Available list is also used to select columns on which to sort.

Selected

The Selected list shows the columns that have been selected for partitioning on, collecting on, or sorting on, and displays information about them. The Usage field indicates whether a particular key column is being used for sorting or partitioning, or both. If a column is being used for sorting, the following information is displayed:

The sort order, ascending or descending.
The collating sequence, ASCII (the default) or EBCDIC.
Whether an alphanumeric key is case sensitive.

Last updated: 2012-10-8