Sorting data and database sources
Where processing stages sort from a source database connector, the InfoSphere® DataStage® Balanced Optimization tool imposes certain rules.
- If the source connector SQL contains an ORDER BY clause and the following processing stage does a stable sort, a WARNING is posted in the optimization log, and the sort is treated as unstable.
- Any ORDER BY clause in the source connector SQL is dropped. If the processing stage does any sorting, stable or unstable, it is converted into a new ORDER BY clause in the new source SQL.
The rules are applied every time another attempt is made to push the logic in a processing stage into a source connector. For example, if there is a sequence of unstable sorts (even with intervening processing stages), only sorting of the last such stage is pushed into the source database connector SQL. To take best advantage of available optimizations, design jobs that require data sorted on multiple columns to sort late in the job, and perform all possible processing before sorting the data. You must also avoid sorting data before a join in your job design. Join implicitly inserts unstable sorts on join keys.
- Where there are different National Language Support code pages or collating sequences in the root job and in the source database.
- Where EBCDIC sorting has been specified in the job.
- Where case-insensitive sorting has been specified in the job.
- Where nulls sort last (rather than first).