Sorting data and database sources

Where processing stages sort from a source database connector, the InfoSphere® DataStage® Balanced Optimization tool imposes certain rules.

The following rules are imposed:
  • If the source connector SQL contains an ORDER BY clause and the following processing stage does a stable sort, a WARNING is posted in the optimization log, and the sort is treated as unstable.
  • Any ORDER BY clause in the source connector SQL is dropped. If the processing stage does any sorting, stable or unstable, it is converted into a new ORDER BY clause in the new source SQL.
A stable sort preserves the order of rows from any previous sort within the new sort, and so acts like secondary sort keys. An unstable sort does not guarantee to preserve any previous ordering.

The rules are applied every time another attempt is made to push the logic in a processing stage into a source connector. For example, if there is a sequence of unstable sorts (even with intervening processing stages), only sorting of the last such stage is pushed into the source database connector SQL. To take best advantage of available optimizations, design jobs that require data sorted on multiple columns to sort late in the job, and perform all possible processing before sorting the data. You must also avoid sorting data before a join in your job design. Join implicitly inserts unstable sorts on join keys.

InfoSphere DataStageBalanced Optimization does not support the pushing of sorting into database sources in the following circumstances:
  • Where there are different National Language Support code pages or collating sequences in the root job and in the source database.
  • Where EBCDIC sorting has been specified in the job.
  • Where case-insensitive sorting has been specified in the job.
  • Where nulls sort last (rather than first).