Sorting environment variables in DataStage

The following environment variables are concerned with how DataStage® automatically sorts data.

APT_NO_SORT_INSERTION environment variable in DataStage

Set the APT_NO_SORT_INSERTION environment variable to prevent the automatic insertion of sort components in your job to optimize the performance of the operators in your data flow.

DataStage automatically inserts sort components in your job to optimize the performance of the operators in your data flow. Set APT_NO_SORT_INSERTION to prevent this automatic insertion.

APT_SORT_INSERTION_CHECK_ONLY environment variable in DataStage

Set the APT_SORT_INSERTION_CHECK_ONLY environment variable so that sorts just check that the order is correct, rather than actually sorting, when sorts are inserted automatically by DataStage.

This is a better alternative to shutting partitioning and sorting insertion off using APT_NO_PART_INSERTION and APT_NO_SORT_INSERTION.

APT_TSORT_NO_OPTIMIZE_BOUNDED environment variable in DataStage

Set the APT_TSORT_NO_OPTIMIZE_BOUNDED environment variable to prevent the optimization of bounded length fields by the tsort operator.

By default the tsort operator optimizes bounded length fields by converting them to variable length before the sort and converts them back to bounded length after the sort. For records with many bounded fields where the actual size of the data is much smaller than the upper bound the optimization results in a large reduction of disk I/O.

For records that do not have large differences between data size and upper bound of bounded length fields, this optimization often causes slower sort performance. In this situation, set the APT_TSORT_NO_OPTIMIZE_BOUNDED environment variable to disable the optimization.

Setting the APT_TSORT_NO_OPTIMIZE_BOUNDED environment variable when APT_OLD_BOUNDED_LENGTH is set has no effect as APT_OLD_BOUNDED_LENGTH also disables the bounded length optimization.

APT_TSORT_STRESS_BLOCKSIZE environment variable in DataStage

Set the APT_TSORT_STRESS_BLOCKSIZE environment variable to specify the size of the shared memory block used to pass data between the write, sort, and merge processes.