Specifying sorting keys

Sorting keys specify the criteria used to perform the sort. The psort operator allows you to set a primary sorting key and multiple secondary sorting keys.

The psort operator uses the sorting keys to determine the sorting order of a data set. The sort operator first sorts the records by the primary sorting key. If multiple records have the same primary key value, the psort operator then sorts these records by any secondary keys.

You must define a single primary sorting key for the psort operator. You might optionally define as many secondary keys as required by your job. Note, however, that each record field can be used only once as a sorting key. Therefore, the total number of primary and secondary sorting keys must be less than or equal to the total number of fields in the record.

The following figure shows four records whose schema contains three fields:

Shows records from an example data set being sorted on various primary and secondary keys

This figure also shows the results of three sorts using different combinations of sorting keys. In this figure, the lName field represents a string field and the age field represents an integer. By default, the psort operator uses a case-sensitive algorithm for sorting. This means that uppercase strings appear before lowercase strings in a sorted data set. You can use an option to the psort operator to select case-insensitive sorting. You can use the member function APT_PartitionSortOperator::setKey() to override this default, to perform case-insensitive sorting on string fields.

By default, the psort operator APT_PartitionSortOperator uses ascending sort order, so that smaller values appear before larger values in the sorted data set. You can use an option to the psort operator to select descending sorting.