Psort: syntax and options

The syntax for the psort operator in an osh command is shown below. Option values you supply are shown in italic typeface. When your value contains a space or a tab character, you must enclose the value in single or double quotes.


psort 
  -key field_name [-ci | -cs] [-asc | -desc] [-ebcdic] 
  [-key field_name [-ci | -cs] [-asc | -desc] [-ebcdic] ...]
  [-extraOpts  syncsort_options ]
  [-memory  num_megabytes ]
  [-sorter unix | syncsort]
  [-stable]
  [-stats]
  [-unique]
  [-workspace workspace]

You must use -key to specify at least one sorting key to the operator. You use the -part option to configure the operator to run in parallel, and the -seq option to specify sequential operation.

If you include the -ebcdic option, you must also include the -sorter option with a value of syncsort. When you do not include the -sorter option, the default sort is unix which is incompatible with the EBCDIC collation sequence.

Example usage:


psort -sorter syncsort -key a -ebcdic

Table 1. psort Operator Options
Option Use
-key -key field_name [-ci | -cs] [-asc | -desc] [-ebcdic]

If the -ebcdic suboption is specified, you must also include the -sorter option with a value of syncsort.

The -key option specifies a key field for the sort. The first -key option defines the primary key field for the sort; lower-priority key fields are supplied on subsequent -key specifications.

You must specify this option to psort at least once.

-key requires that field_name be a field of the input data set. The data type of the field must be one of the following data types:

int8 , int16 , int32 , int64 , uint8, uint16, uint32 , uint64

sfloat, dfloat

string[n] , where n is an integer literal specifying the string length

-ci or -cs are optional arguments for specifying case-sensitive or case-insensitive sorting. By default, the operator uses a case-sensitive algorithm for sorting. This means that uppercase strings appear before lowercase strings in a sorted data set. You can override this default to perform case-insensitive sorting on string fields.

-asc or -desc specify optional arguments for specifying ascending or descending sorting By default, the operator uses ascending sort order, so that smaller values appear before larger values in the sorted data set. You can use descending sorting order as well, so that larger values appear before smaller values in the sorted data set.

-ebcdic (string fields only) specifies to use EBCDIC collating sequence for string fields. Note that InfoSphere® DataStage® stores strings as ASCII text; this property only controls the collating sequence of the string.

If you include the -ebcdic option, you must also include the -sorter option with a value of syncsort. When you do not include the -sorter, the default sort is unix which is incompatible with the EBCDIC collation sequence.

When you use the EBCDIC collating sequence, lowercase letters sort before upper-case letters (unless you specify the -ci option to select case-insensitive sorting). Also, the digits 0-9 sort after alphabetic characters. In the default ASCII collating sequence used by the operator, numbers come first, followed by uppercase, then lowercase letters.

-extraOpts -extraOpts syncsort_options

Specifies command-line options passed directly to SyncSort. syncsort_options contains a list of SyncSort options just as you would normally type them on the SyncSort command line.

-memory -memory num_megabytes

Causes the operator to restrict itself to num_megabytes megabytes of virtual memory on a processing node.

-memory requires that 1 < num_megabytes < the amount of virtual memory available on any processing node. We recommend that num_megabytes be smaller than the amount of physical memory on a processing node.

-part -part partitioner

This is a deprecated option. It is included for backward compatability.

-seq -seq

This is a deprecated option. It is included for backward compatability.

-sorter -sorter unix | syncsort

Specifies the sorting utility used by the operator. The default is unix, corresponding to the UNIX sort utility.

-stable -stable

Specifies that this sort is stable. A stable sort guarantees not to rearrange records that are already sorted properly in a data set.

The default sorting method is unstable. In an unstable sort, no prior ordering of records is guaranteed to be preserved by the sorting operation, but might processing might be slightly faster.

-stats -stats

Configures psort to generate output statistics about the sorting operation and to print them to the screen.

-unique -unique

Specifies that if multiple records have identical sorting key values, only one record is retained. If stable is set, then the first record is retained.

-workspace -workspace workspace

Optionally supply a string indicating the workspace directory to be used by the sorter