IBM Support

When running DataStage parallel jobs, records are not evenly distributed across nodes

Troubleshooting


Problem

On a 2 node configuration, the job partitions the data using Round Robin and sorts on the unique key of the data, but sometimes the last record is placed on the same node as the second to last record.

Symptom

The job consists of the following stages: <Data Set> --> <Transformer> --> <Filter> --> <Data Set>

It passes the starting number and count of records in the initial data set into the job as a variables. The transformer partitions the data using Round Robin and sorts on the unique key of the data. It uses a transform variable to calculate the Next Number for each record:

((@PARTITIONNUM+(@NUMPARTITIONS*(@INROWNUM-1))+1)+NextBatchNmbr)

The transform also calculates a field with the maximum value allowed for the Next Number by adding the initial Next Number and Record Count. The filter is used to reject any records where the Next Number is larger than it is expected to be.


An instance has been observed where the last record is placed on the same node as the second to last record causing the calculated number to skip a value and results in the last record being rejected.

Resolving The Problem

The "Round robin partitioning" means that when records arrive at a specific partition, they are re-distributed in a round robin manner to the output partitions. However, there are two things that you should be aware of:

  1. If you have multiple partitions on input, the round robin behavior is independent in each partition. If there are unequal numbers of records on the input partitions, the framework does not equalize partition distribution of record across input partitions. Consider this example:

    There are 5 records on node1 and 7 records on node2.  Node1's round robin will distribute approximately 3 records to node1, and approximately 2 records to node2. Node2's round robin will distribute approximately 4 records to node1, and approximately 3 records to node2. So, there is the possibility of having 7 records on node1 and 5 records on node2, depending on the ordering of the nodes that is interpreted within the framework code for each individual node.


  2. The framework does not guarantee precise round robin behavior; it is an approximate ordering of the data in a round robin manner. We do not guarantee even distribution using the round robin option. One of the reasons can be seen in the above example.

If you wish to have a precise ordering of data, and a precise balance of data, you will need to downgrade your input data set to sequential only, so that a sequential to parallel round robin effort will be made.

However, even then, on an uneven numbering of records (odd count of records over 2 nodes), the resulting partitioning balance will still be uneven.



In some cases, the only guaranteed solution may be to downgrade to a single node or design the job to run sequentially.

[{"Product":{"code":"SSVSEF","label":"IBM InfoSphere DataStage"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Component":"--","Platform":[{"code":"PF010","label":"HP-UX"},{"code":"PF016","label":"Linux"},{"code":"PF027","label":"Solaris"},{"code":"PF033","label":"Windows"},{"code":"PF002","label":"AIX"}],"Version":"9.1;8.7;8.5;8.1;8.0","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Document Information

Modified date:
16 June 2018

UID

swg21612938