tsort: syntax and options
The syntax and options of the tsort operator.
The syntax for the tsort operator in an osh command is shown below:
tsort
-key field [ci | cs] [-ebcdic] [-nulls first | last] [-asc | -desc]
[-sorted | -clustered] [-param params ]
[-key field [ci | cs] [-ebcdic] [-nulls first | last] [-asc | -desc]
[-sorted | -clustered] [-param params ] ...]
[-collation_sequence locale |collation_file_pathname | OFF]
[-flagKey]
[-flagCuster]
[-memory num_megabytes ]
[-stable | -nonstable]
[-stats]
[-unique]
You must use -key to specify at least one sorting key to the operator.
| Option | Use |
|---|---|
| -key | -key field [ci | cs] [-ebcdic] [-nulls first
| last] [-asc | -desc] [-sorted | -clustered] [-param params]
Specifies a key field for the sort. The first -key defines the primary key field for the sort; lower-priority key fields are supplied on subsequent key specifications. -key requires that field be a field of the input data set. -ci | -cs are optional arguments for specifying case-sensitive or case insensitive sorting. By default, the operator does case-sensitive sorting. This means that uppercase strings appear before lowercase strings in a sorted data set. You can override this default to perform case-insensitive sorting on string fields only. -asc| -desc are optional arguments for specifying ascending or descending sorting. By default, the operator uses ascending sort order, so that smaller values appear before larger values in the sorted data set. You can specify descending sorting order instead, so that larger values appear before smaller values in the sorted data set. -ebcdic (string fields only) specifies to use EBCDIC collating sequence for string fields. Note that InfoSphere® DataStage® stores strings as ASCII text; this property only controls the collating sequence of the string. For example, using the EBCDIC collating sequence, lowercase letters sort before uppercase letters (unless you specify the -ci option to select case-insensitive sorting). Also, the digits 0-9 sort after alphabetic characters. In the default ASCII collating sequence used by the operator, numbers come first, followed by uppercase, then lowercase letters. -sorted specifies that input records are already sorted by this field. The operator then sorts on secondary key fields, if any. This option can increase the speed of the sort and reduce the amount of temporary disk space when your records are already sorted by the primary key field(s) because you only need to sort your data on the secondary key field(s). -sorted is mutually exclusive with -clustered; if any sorting key specifies -sorted, no key can specify -clustered. continued |
| -key (continued) | If you specify -sorted for all sorting key fields, the operator verifies that the input data set is correctly sorted, but does not perform any sorting. If the input data set is not correctly sorted by the specified keys, the operator fails. -clustered specifies that input records are already grouped by this field, but not sorted. The operator then sorts on any secondary key fields. This option is useful when your records are already grouped by the primary key field(s), but not necessarily sorted, and you want to sort your data only on the secondary key field(s) within each group. -clustered is mutually exclusive with -sorted; if any sorting key specifies -clustered, no key can specify -sorted.-nulls specifies whether null values should be sorted first or last. The default is first. The -param suboption allows you to specify extra parameters for a field. Specify parameters using property =value pairs separated by commas. |
| -collation_sequence | -collation_sequence locale |collation_file_pathname |
OFF This option determines how your string data is sorted. You can: Specify a predefined IBM® ICU locale Write your own collation sequence using ICU syntax, and supply its collation_file_pathname Specify OFF so that string comparisons are made using Unicode code-point value order, independent of any locale or custom sequence. By default, InfoSphere DataStage sorts strings using byte-wise comparisons. For more information, reference this IBM ICU site: http://oss.software.ibm.com/icu/userguide /Collate_Intro.htm |
| -flagCluster | -flagCluster Tells the operator to create the int8 field clusterKeyChange in each output record. The clusterKeyChange field is set to 1 for the first record in each group where groups are defined by the -sorted or -clustered argument to -key. Subsequent records in the group have the clusterKeyChange field set to 0. You must specify at least one sorting key field that uses either -sorted or -clustered to use -flagCluster, otherwise the operator ignores this option. |
| -flagKey | Optionally specify whether to generate a flag field that identifies the key-value changes in output. |
| -memory | -memory num_megabytes Causes the operator to restrict itself to num_megabytes megabytes of virtual memory on a processing node. -memory requires that 1 < num_megabytes < the amount of virtual memory available on any processing node. We recommend that num_megabytes be smaller than the amount of physical memory on a processing node. |
| -stable | -stable Specifies that this sort is stable. A stable sort guarantees not to rearrange records that are already sorted properly in a data set. The default sorting method is unstable. In an unstable sort, no prior ordering of records is guaranteed to be preserved by the sorting operation. |
| -stats | -stats Configures tsort to generate output statistics about the sorting operation. |
| -unique | -unique Specifies that if multiple records have identical sorting key values, only one record is retained. If -stable is set, the first record is retained. |