sort — Start the sort-merge utility

Format

sort [–cmu] [–o outfile] [–t char] [–y[n]] [–zn] [–bdfiMnr] [–k startpos[,endpos]] … [file …]

sort [–cmu] [–o outfile] [–tchar] [–yn] [–zn] [–bdfiMnr] [+startposition [endposition]] [file …]

Description

sort implements a full sort-and-merge utility. By default, it sorts according to all the information in the record, in the order given in the record.

sort operates on input files containing records that are separated by the newline character. When you do not specify either the –c or –m option, sort sorts the concatenation of all input files and produces the output on standard output. If you do not specify any files, sort reads from the standard input (stdin). If you specify - as one of the file names, sort reads from the standard input (stdin).

The following options select particular operations:
–c
Checks input files to ensure that they are correctly ordered according to the key position and sort ordering options specified, but does not modify or output the files. This option affects only the exit code.
–m
Merges files into one sorted output stream. This option assumes that each input file is correctly ordered according to the other options specified on the command line; you can check this with the –c option.
–u
Ensures that output records are unique. If two or more input records have equal sort keys, sort writes only the first record to the output. When you use –u with –c, sort prints a diagnostic message if the input records have any duplicates.

When you do not specify either the –c or the –m option, sort sorts the concatenation of all input files and produces the output on standard output.

Options

–o outfile
Writes output to the file outfile. By default, sort writes output to the standard output. The output file can be one of the input files. In this case, sort makes a copy of the data to allow the (potential) overwriting of the input file.
–t char
Indicates that the character char separates input fields. When you do not specify the –t option, sort assumes that any number of white space (blank or tab) characters separate fields.
–y[n]
Restricts the amount of memory available for sorting to n KB of memory (where a KB of memory is 1024 bytes). If n is missing, sort chooses a reasonable maximum amount of memory for sorting, dependent on the system configuration. sort needs at least enough memory to hold five records simultaneously. If you try to request less, sort automatically takes enough. When the input files overflow the amount of memory available, sort automatically does a polyphase merge (external sorting) algorithm, which is, of necessity, much slower than internal sorting. n must be at least 2. n has a maximum value of 1024 and a default value of 56.

When you use –u with –c, sort prints a diagnostic message if the input records have any duplicates. Using the –y option may therefore improve sorting performance substantially for medium to large input files.

–zn
Indicates that the longest input record (including the newline character) is n bytes in length. By default, record length is limited to LINE_MAX.
The following options control the way in which sort does comparisons between records in order to determine the order in which the records are placed on the output. The ordering options apply globally to all sorting keys except those keys for which you individually specify the ordering option. For more information about sorting keys, see Sorting keys.
–b
Skips, for comparison purposes, any leading white space (blank or tab) in any field (or key specification).
–d
Uses dictionary ordering. With this option, sort examines only blanks, uppercase and lowercase letters, and numbers when making comparisons.
–f
Converts lowercase letters to uppercase for comparison purposes.
–i
Ignores, for comparison purposes, nonprintable characters.
–k [startpos [,endpos]]
Specifies a sorting key. For more information, see Sorting keys.
–M
Assumes that the field contains a month name for comparison purposes. Any leading white space is ignored. If the field starts with the first three letters of a month name in uppercase or lowercase, the comparisons are in month-in-year order. Anything that is not a recognizable month name compares less than JAN.
–n
Assumes that the field contains an initial numeric value. sort sorts first by numeric value and then by the remaining text in the field according to options.

Numeric fields can contain leading optional blanks or optional minus (-) signs. sort does not recognize the plus (+) sign.

This option treats a field which contains no digits as if it had a value of zero. If more than one line contains no digits, the lines are sorted alphanumerically.

–r
Reverses the order of all comparisons so that sort writes output from largest to smallest rather than smallest to largest.

Sorting keys

By default, sort examines entire input records to determine ordering. By specifying sorting keys on the command line, you can tell sort to restrict its attention to one or more parts of each record.

You can indicate the start of a sorting key with:
-k m[.n][options]
where m and the optional n are positive integers. You can choose options from the set bdfiMnr (described previously) to specify the way in which sort does comparisons for that sorting key. Ordering options set for a key override global ordering options. If you do not specify any options for the key, the global ordering options are used.

The number m specifies which field in the input record contains the start of the sorting key. The character given with the –t option separates input fields; if this option is not specified, spaces or tabs separate the fields. The resulting sort key is from the mth field to the end of the record. The number n specifies which character in the mth field marks the start of the sorting key; if you do not specify n, the sorting key starts at the first character of the mth field.

If an ending position for a key is not specified, the sorting key extends from the starting position to the end of the input record. You can also specify an ending position for a key, with:
-k m[.n][options],p[.q][options]
where p and q are positive integers, indicating that the sort key ends with the with qth character of the pth field. If you do not specify q or if you specify a value of 0 for q, the sorting key ends at the last character of the pth field. For example:
-k 2.3,4.6
defines a sorting key that extends from the third character of the second field to the sixth character of the fourth field. The b option applies only the key start or key end for which it is specified;
-k 2
defines a sorting key that extends from the first character of the second field to the end of the record;
-k2 2
defines a sorting key that extends from the first character of the second field to the last character of the second field.
sort also supports a historical method of defining the sorting key. Using this method, you indicate the start of the sorting key with:
+m[.n][options]
which is equivalent to:
–k m+1[.n+1][options]
You can also indicate the end of a sorting key with:
p[.q][options]
which when preceded with +m[.n] is equivalent to:
–k m+1[.n+1],p.0[options]
if q is specified and is zero. Otherwise,
–k m+1[.n+1],p+1[.q][options]

For example: +1.2 -3.5 defines a sorting key with a starting position that sort finds by skipping the first two characters of the next field and an ending position that sort finds by skipping the first three fields and then the first five characters of the next field. In other words, the sorting key extends from the third character of the second field to the sixth character of the fourth field. This is the same key as defined under the –k option, described earlier.

With either syntax, if the end of a sorting key is not a valid position after the beginning key position, the sorting key extends to the end of the input record.

You can specify multiple sort key positions by using several –k options or several + and options. In this case, sort uses the second sorting key only for records where the first sorting keys are equal, the third sorting key only when the first two are equal, and so on. If all key positions compare equal, sort determines ordering by using the entire record.

When you specify the –u option to determine the uniqueness of output records, sort looks only at the sorting keys, not the whole record. (Of course, if you specify no sorting keys, sort considers the whole record to be the sorting key.)

Examples

  1. To sort an input file having lines consisting of the day of the month, white space, and the month, as in:
    30 December
    23    MAY
    25 June
    10     June
    use the command:
    sort -k 2M -k 1n
  2. To merge two dictionaries, with one word per line:
    sort –m –dfi dict1 dict2 >newdict

Environment variables

sort uses the following environment variable:
TMPDIR
Contains the path name of the directory to be used for temporary files.

Files

sort uses the following file:
/tmp/stm*
Temporary files used for merging and –o option. You can specify a different directory for temporary files using the TMPDIR environment variable.

Localization

sort uses the following localization environment variables:
  • LANG
  • LC_ALL
  • LC_COLLATE
  • LC_CTYPE
  • LC_MESSAGES
  • LC_NUMERIC
  • LC_TIME
  • NLSPATH

The –M option works only if LC_TIME identifies a locale that contains the same month names as the POSIX locale.

See Localization for more information.

Exit values

0
Successful completion. Also returned if –c is specified and the file is in correctly sorted order.
1
Returned if you specified –c and the file is not correctly sorted. Also returned to indicate a non-unique record if you specified –cu.
2
Failure due to any of the following:
  • Missing key description after –k
  • More than one –o option
  • Missing file name after –o
  • Missing character after –t
  • More than one character after –t
  • Missing number with –y or –z
  • endposition given before a startposition
  • Badly formed sort key
  • Incorrect command-line option
  • Too many key field positions specified
  • Insufficient memory
  • Inability to open the output file
  • Inability to open the input file
  • Error in writing to the output file
  • Inability to create a temporary file or temporary file name

Messages

Possible error messages include:
Badly formed sort key position x
The key position was not specified correctly. Check the format and try again.
File filename is binary
sort has determined that filename is binary because it found a NULL (' ') character in a line.
Insufficient memory for …
This error normally occurs when you specify very large numbers for –y or –z and there is not enough memory available for sort to satisfy the request.
Line too long: limit nn — truncated
Any input lines that are longer than the default number of bytes (LINE_MAX) or the number specified with the –z option are truncated.
Missing key definition after -k
You specified –k, but did not specify a key definition after the –k.
No newline at end of file
Any file not ending in a newline character has one added.
Nonunique key in record …
With the –c and –u options, a non-unique record was found.
Not ordered properly at …
With the –c option, an incorrect ordering was discovered.
Tempfile error on …
The named temporary (intermediate) file could not be created. Make sure that you have a directory named /tmp, and that this directory has space to create files. You can change the directory for temporary files using the TMPDIR environment variable.
Tempnam() error
sort could not generate a name for a temporary working file. This should almost never happen.
Temporary file error (no space) for …
Insufficient space was available for a temporary file. Make sure that you have a directory named /tmp, and that this directory has space to create files. You can change the directory for temporary files using the ROOTDIR and TMPDIR environment variables.
Too many key field positions specified
This implementation of sort has a limit of 64 key field positions.
Write error (no space) on output
Some error occurred in writing the standard output. Barring write-protected media and the like, this typically occurs when there is insufficient disk space to hold all of the intermediate data.

Portability

POSIX.2, X/Open Portability Guide.

Available on all UNIX systems, with only UNIX System V.2 or later having the full utility described here.

The –M, –y, and –z options are extensions of the POSIX standard.

Related information

awk, comm, cut, join, uniq

The sortgen awk script is a useful way to handle complex sorting tasks. It originally appeared in The AWK Programming Language, by Aho, Weinberger, and Kernighan. The POSIX.2POSIX.2 standard regards the historical syntax for defining sorting keys as obsolete. Therefore, you should use only the –k option in the future.