Format
sort [–cmu]
[–o outfile] [–t char]
[–y[n]] [–zn]
[–bdfiMnr] [–k startpos[,endpos]]
… [file …]
sort [–cmu]
[–o outfile] [–tchar]
[–yn] [–zn]
[–bdfiMnr] [+startposition [–endposition]] … [file
…]
Description
sort implements
a full sort-and-merge utility. By default, it sorts according
to all the information in the record, in the order given in the record.
sort operates
on input files containing records that are separated by the newline
character. When you do not specify either the –c or –m option, sort sorts
the concatenation of all input files and produces the output on standard
output. If you do not specify any files, sort reads
from the standard input (stdin). If you specify - as one of the file
names, sort reads from the standard input
(stdin).
The following options select particular operations:
- –c
- Checks input files to ensure that they are correctly ordered according
to the key position and sort ordering options specified, but does
not modify or output the files. This option affects only the exit
code.
- –m
- Merges files into one sorted output
stream. This option assumes that each input file is correctly ordered
according to the other options specified on the command line; you
can check this with the –c option.
- –u
- Ensures that output records are unique. If two or more input records
have equal sort keys, sort writes only the
first record to the output. When you use –u with –c, sort prints
a diagnostic message if the input records have any duplicates.
When you do not specify either the –c or
the –m option, sort sorts
the concatenation of all input files and produces the output on standard
output.
Options
- –o outfile
- Writes output to the file outfile. By
default, sort writes output to the standard
output. The output file can be one of the input files. In this case, sort makes
a copy of the data to allow the (potential) overwriting of the input
file.
- –t char
- Indicates that the character char separates
input fields. When you do not specify the –t option, sort assumes
that any number of white space (blank or tab) characters separate
fields.
- –y[n]
- Restricts the amount of memory available for sorting to n KB
of memory (where a KB of memory is 1024 bytes). If n is
missing, sort chooses a reasonable maximum
amount of memory for sorting, dependent on the system configuration. sort needs
at least enough memory to hold five records simultaneously. If you
try to request less, sort automatically
takes enough. When the input files overflow the amount of memory available, sort automatically
does a polyphase merge (external sorting) algorithm, which is, of
necessity, much slower than internal sorting. n must
be at least 2. n has a maximum value of
1024 and a default value of 56.
When you use –u with –c, sort prints
a diagnostic message if the input records have any duplicates. Using
the –y option may therefore improve sorting
performance substantially for medium to large input files.
- –zn
- Indicates that the longest input record (including the newline
character) is n bytes in length. By default,
record length is limited to LINE_MAX.
The following options control the way in which
sort does
comparisons between records in order to determine the order in which
the records are placed on the output. The ordering options apply globally
to all sorting keys except those keys for which you individually specify
the ordering option. For more information about sorting keys, see
Sorting keys.
- –b
- Skips, for comparison purposes, any leading white space (blank
or tab) in any field (or key specification).
- –d
- Uses dictionary ordering. With this option, sort examines
only blanks, uppercase and lowercase letters, and numbers when making
comparisons.
- –f
- Converts lowercase letters to uppercase for comparison purposes.
- –i
- Ignores, for comparison purposes, nonprintable characters.
- –k [startpos [,endpos]]
- Specifies a sorting key. For more information, see Sorting keys.
- –M
- Assumes that the field contains a month name for comparison purposes.
Any leading white space is ignored. If the field starts with the first
three letters of a month name in uppercase or lowercase, the comparisons
are in month-in-year order. Anything that is not a recognizable month
name compares less than JAN.
- –n
- Assumes that the field contains an initial numeric value. sort sorts
first by numeric value and then by the remaining text in the field
according to options.
Numeric fields can contain leading optional
blanks or optional minus (-) signs. sort does
not recognize the plus (+) sign.
This option treats a field
which contains no digits as if it had a value of zero. If more than
one line contains no digits, the lines are sorted alphanumerically.
- –r
- Reverses the order of all comparisons so that sort writes
output from largest to smallest rather than smallest to largest.
Sorting keys
By default, sort examines
entire input records to determine ordering. By specifying sorting
keys on the command line, you can tell sort to
restrict its attention to one or more parts of each record.
You
can indicate the start of a sorting key with:
-k m[.n][options]
where
m and the optional
n are
positive integers. You can choose
options from
the set
bdfiMnr (described previously) to
specify the way in which
sort does comparisons
for that sorting key. Ordering options set for a key override global
ordering options. If you do not specify any options for the key, the
global ordering options are used.
The
number m specifies which field in the input
record contains the start of the sorting key. The character given
with the –t option separates input fields;
if this option is not specified, spaces or tabs separate the fields. The
resulting sort key is from the mth field
to the end of the record. The number n specifies
which character in the mth field marks the
start of the sorting key; if you do not specify n,
the sorting key starts at the first character of the mth
field.
If an ending position for a key is not specified,
the sorting key extends from the starting position to the end of the
input record. You can also specify an ending position for a key,
with:
-k m[.n][options],p[.q][options]
where
p and
q are
positive integers, indicating that the sort key ends with the with
qth
character of the
pth field. If you do not
specify
q or if you specify a value of
0 for
q,
the sorting key ends at the last character of the
pth
field. For example:
-k 2.3,4.6
defines a
sorting key that extends from the third character of the second field
to the sixth character of the fourth field. The
b option
applies only the key start or key end for which it is specified;
-k 2
defines
a sorting key that extends from the first character of the second
field to the end of the record;
-k2 2
defines
a sorting key that extends from the first character of the second
field to the last character of the second field.
sort also
supports a historical method of defining the sorting key. Using this
method, you indicate the start of the sorting key with:
+m[.n][options]
which is equivalent to:
–k m+1[.n+1][options]
You
can also indicate the end of a sorting key with:
–p[.q][options]
which
when preceded with +
m[.
n]
is equivalent to:
–k m+1[.n+1],p.0[options]
if
q is specified and is zero. Otherwise,
–k m+1[.n+1],p+1[.q][options]
For
example: +1.2 -3.5 defines a sorting key with a starting position
that sort finds by skipping the first two
characters of the next field and an ending position that sort finds
by skipping the first three fields and then the first five characters
of the next field. In other words, the sorting key extends from the
third character of the second field to the sixth character of the
fourth field. This is the same key as defined under the –k option,
described earlier.
With either syntax, if the end of a sorting
key is not a valid position after the beginning key position, the
sorting key extends to the end of the input record.
You can
specify multiple sort key positions by using several –k options
or several + and – options.
In this case, sort uses the second sorting
key only for records where the first sorting keys are equal, the third
sorting key only when the first two are equal, and so on. If all key
positions compare equal, sort determines
ordering by using the entire record.
When you specify the –u option
to determine the uniqueness of output records, sort looks
only at the sorting keys, not the whole record. (Of course, if you
specify no sorting keys, sort considers
the whole record to be the sorting key.)
Examples
- To sort an input file having lines consisting of the day of the
month, white space, and the month, as in:
30 December
23 MAY
25 June
10 June
use the command: sort -k 2M -k 1n
- To merge two dictionaries, with one word per line:
sort –m –dfi dict1 dict2 >newdict
Environment variables
sort uses
the following environment variable:
- TMPDIR
- Contains the
path name of the directory to be used for temporary files.
Files
sort uses
the following file:
- /tmp/stm*
- Temporary files
used for merging and –o option. You can
specify a different directory for temporary files using the TMPDIR
environment variable.
Localization
sort uses
the following localization environment variables:
- LANG
- LC_ALL
- LC_COLLATE
- LC_CTYPE
- LC_MESSAGES
- LC_NUMERIC
- LC_TIME
- NLSPATH
The –M option works only if
LC_TIME identifies a locale that contains the same month names as
the POSIX locale.
See Localization for
more information.
Exit values
- 0
- Successful completion. Also returned if –c is
specified and the file is in correctly sorted order.
- 1
- Returned if you specified –c and the
file is not correctly sorted. Also returned to indicate a non-unique
record if you specified –cu.
- 2
- Failure due to any of the following:
- Missing key description after –k
- More than one –o option
- Missing file name after –o
- Missing character after –t
- More than one character after –t
- Missing number with –y or –z
- endposition given before a startposition
- Badly formed sort key
- Incorrect command-line option
- Too many key field positions specified
- Insufficient memory
- Inability to open the output file
- Inability to open the input file
- Error in writing to the output file
- Inability to create a temporary file or temporary file name
Messages
Possible error messages include:
- Badly formed sort key position x
- The key position was not specified correctly. Check the format
and try again.
- File filename is binary
- sort has determined that filename is
binary because it found a NULL (' ') character in a line.
- Insufficient memory for …
- This error normally occurs when you specify very large numbers
for –y or –z and
there is not enough memory available for sort to
satisfy the request.
- Line too long: limit nn — truncated
- Any input lines that are longer than the default number of bytes
(LINE_MAX) or the number specified with the –z option
are truncated.
- Missing key definition after -k
- You specified –k, but did not specify
a key definition after the –k.
- No newline at end of file
- Any file not ending in a newline character has one added.
- Nonunique key in record …
- With the –c and –u options,
a non-unique record was found.
- Not ordered properly at …
- With the –c option, an incorrect ordering
was discovered.
- Tempfile error on …
- The named temporary (intermediate) file could not be created.
Make sure that you have a directory named /tmp, and that this
directory has space to create files. You can change the directory
for temporary files using the TMPDIR environment variable.
- Tempnam() error
- sort could not generate a name for a
temporary working file. This should almost never happen.
- Temporary file error (no space) for …
- Insufficient space was available for a temporary file. Make sure
that you have a directory named /tmp, and that this directory
has space to create files. You can change the directory for temporary
files using the ROOTDIR and TMPDIR environment variables.
- Too many key field positions specified
- This implementation of sort has a limit
of 64 key field positions.
- Write error (no space) on output
- Some error occurred in writing the standard output. Barring write-protected
media and the like, this typically occurs when there is insufficient
disk space to hold all of the intermediate data.
Portability
POSIX.2, X/Open Portability Guide.
Available
on all UNIX systems,
with only UNIX System
V.2 or later having the full utility described here.
The –M, –y,
and –z options are extensions of the POSIX
standard.
Related information
awk, comm, cut, join, uniq
The sortgen awk script
is a useful way to handle complex sorting tasks. It originally appeared
in The AWK Programming Language, by Aho, Weinberger,
and Kernighan. The POSIX.2POSIX.2 standard
regards the historical syntax for defining sorting keys as obsolete.
Therefore, you should use only the –k option
in the future.