join — Join two sorted textual relational databases

Format

  • join [–a n] [–e s] [–o list] [–t c] [–v n] [–1 n] [–2 n] file1 file2
  • join [–a n] [–e s] [–j[n] m] [–o list] [–t c] file1 file2

Description

join joins two databases. It assumes that both file1 and file2 contain textual databases in which each input line is a record and that the input records are sorted in ascending order on a particular join key field (by default the first field in each file). If you specify in place of file1 or file2, join uses the standard input (stdin) for that file. If you specify in place of both file1 and file2, the output is undefined.

Conceptually, join computes the Cartesian product of records from both files. By default, spaces or tabs separate input fields and join discards any leading or trailing white space. (There can be no white-space-delimited empty input fields.) It then generates output for those combined records in which the join key field (the first field by default) matches in each file. The default output for join is the common join key field, followed by all the other fields in file1, and then all the other fields in file2. The other fields from each file appear in the same order they appeared in the original file. The default output field separator is a space character.

Restriction: Start of changeLine lengths in files that are used with the join command are limited to a length of 2048.End of change

Options

–a n
Produces an output line for lines that do not match in addition to one for a pair of records that does match. If you specify n as one of 1 or 2, join produces unpaired records from only that file. If you specify both –a 1 and –a 2, it produces unpaired records from both files.
–e string
Replaces an empty field with string on output. In a double-byte locale, string can contain double-byte characters.
–j[n] m
Uses field number m as the join key field. By default, the join key field is the first field in each input line. As with the –a option, if n is present, this option specifies the key field just for that file; otherwise, it specifies it for both files.
–o list
Specifies the fields to be output. You can specify each element in list as either n.m, where n is a file number (1 or 2) and m is a field number, or as 0 (zero), which represents the join field. You can specify any number of output fields by separating them with blanks or commas. The POSIX-compatible version of this command (first form in the syntax) requires multiple output fields to be specified as a single argument; therefore, shell quoting might be necessary. join outputs the fields in the order you list them.
–t c
Sets the field separator to the character c. Each instance of c introduces a new field, making empty fields possible. In a double-byte locale, c can be a double-byte character.
–v n
Suppresses matching lines. If you specify n as one of 1 or 2, join produces unpaired records from only that file. If you specify both –v 1 and –v 2, it produces unpaired records from both files. Lines that are produced by using the –a option are not suppressed.
–1 n
Uses the nth field of file1 as the join key field.
–2 n
Uses the nth field of file2 as the join key field.

Examples

  1. The following script produces a report about files in the working directory that contains containing file name, file mode, and an estimate at what the file contains:
    file * | tr –s ':' ';' >temp1
    ls –l | tr –s ' ' ' ';' >temp2
    join –t';' –j2 9 –o 1.1 2.1 1.2 ---
    temp1 temp2
    rm temp[12]
  2. This example uses the historical implementation of the join command. The third line in the POSIX-compatible script could be:
    join –t';' –2 9 –o 1.1,2.1,1.2 -- temp1 temp2

Localization

join uses the following localization environment variables:

  • LANG
  • LC_ALL
  • LC_COLLATE
  • LC_CTYPE
  • LC_MESSAGES
  • NLSPATH

See Localization for more information.

Exit values

0
Successful completion
1
Failure due to any of the following situations:
  • Incorrect syntax
  • The wrong number of command-line arguments
  • Inability to open the input file
  • Badly constructed output list
  • Too many –o options on the command line
2
Failure due to an incorrect command-line argument

Messages

Most diagnostics deal with argument syntax and are self-explanatory. For example:
Badly constructed output list at list
Indicates that the list for a –o option did not have the proper syntax.

Portability

POSIX.2, X/Open Portability Guide, UNIX systems.

POSIX considers the –j option to be obsolete.

Related information

awk, comm, cut, paste, sort