Manipulating records and fields
When GAWK reads in a record, it stores all the fields of that record in variables.
You can access each field by a $ followed by the field
number -- so $1 references the first field,
$2 references the second, and so on, all the way to the
last field of the record.
Figure 4 shows the sample text delineated in the default records and fields.
Figure 4. Sample file broken down into AWK records and fields
As described in Elements of an input file, you can reference
an entire record with $0, which includes all fields
and field separators. This is the default value for many commands. So for example,
print, as you've done before, is the same as
print $0 -- both commands print the entire current
record.
To output a certain field, give the name of that field as an argument to
print. Try printing the first field in every record
in the sample file:
$ awk ' { print $1 } ' sample
Heigh-ho!
Most
Then,
$
|
You can give multiple fields in a print statement,
and they can be in any order:
$ awk ' { print $7, $3, $1 } ' sample
holly: heigh-ho! Heigh-ho!
mere is Most
the Then,
$
|
Notice that some lines don't have a seventh field; in such cases, nothing is printed.
When separated by a comma, the fields are output with spaces between them. You can concatenate them by omitting the comma. To print the seventh and eighth fields concatenated together, use:
$ awk ' { print $7 $8 } ' sample
holly:
merefolly:
$
|
You can combine quoted text with fields. Try this:
$ awk ' { print "Field 2: " $2 } ' sample
Field 2: sing,
Field 2: friendship
Field 2: heigh-ho,
$
|
You should already be getting an idea of the power of working on fields and records -- with AWK, tabular data is easy to parse, manipulate, and reformat using just a few simple commands. You can use shell redirection to direct the reformatted output to a new file, or pass it down a pipeline.
Working as a filter, this functionality becomes useful in conjunction with other commands. For example, this command modifies the default output of a date so that it prints in a day month, year format:
$ date|awk '{print $3 " " $2 ", " $6}'
29 Nov, 2006
$
|
The fields in the examples so far have been separated by space characters.
That's the default behavior -- any number of space or tab characters --
and you can change it. The value of the field separator is contained in the
FS variable. Like any variable in AWK, it can be
redefined at any time in a program. To use a different field separator for
the entire file, redefine it in a BEGIN statement.
Print the first field of the sample data with a field separator of an exclamation point (!):
$ awk ' BEGIN { FS = "!" } { print $1 } ' sample
Heigh-ho
Most friendship is feigning, most loving mere folly:
Then, heigh-ho, the holly
$
|
Note the differences when printing the second and third fields:
$ awk ' BEGIN { FS = "!" } { print $2 } ' sample
sing, heigh-ho
$ awk ' BEGIN { FS = "!" } { print $3 } ' sample
unto the green holly:
$
|
Try comparing the fields in the output you get with the fields listed in Figure 4.
But the field separator doesn't have to be a single character. Try using a phrase:
$ awk ' BEGIN { FS = "Heigh-ho" } { print $2 } ' sample
! sing, heigh-ho! unto the green holly:
$
|
In GAWK, the field separator can be any regular expression. To make each input
character its own field, give FS a value of null.
Capitalization counts. The example above only matches one separator in the entire file, but the following example matches the same phrase regardless of case:
$ awk ' BEGIN { FS = "[Hh]eigh-ho" } { print $2 } ' sample
! sing,
, the holly!
$
|
You can also change the field separator from the command line by specifying it
as a quoted argument to the -F option:
$ awk -F "," ' { print $2 } ' sample
heigh-ho! unto the green holly:
most loving mere folly:
heigh-ho
$
|
This functionality makes it easy to create one-liners that can parse files, such as /etc/passwd, where fields are delimited by a colon (:) character. You can easily pull out a list of full user names, for example:
$ awk -F ":" ' { print $5 } ' /etc/passwd
|
As with the field separator, you can change the record separator from its
default -- a newline character -- to anything you'd like. Its current value is
kept in the RS variable.
Change the record separator to a comma, and try it on the sample file:
$ awk ' BEGIN { RS = "," } //' sample
Heigh-ho! sing
heigh-ho! unto the green holly:
Most friendship is feigning
most loving mere folly:
Then
heigh-ho
the holly!
$
|
AWK output is handled just like AWK input data and it is divided into fields and
records; the output stream has its own separators, which are initially set to
the same defaults as the input separators -- spaces and newlines. The
output field separator, used in print
statements in which fields are separated by commas, is set to a single space,
and you can change it by redefining the OFS
variable. The output record separator is set to a newline character,
and you can change it by redefining the ORS
variable.
To strip all newlines from a file and place all the file's text on a single line -- which is useful for certain kinds of textual analysis and filtering -- just change the output record separator to the null character.
Try it on the sample file:
$ awk 'BEGIN {ORS=""} //' sample
Heigh-ho! sing, heigh-ho! unto the green holly:Most friendship\
is feigning, most loving mere folly:Then, heigh-ho, the holly!$
|
Every newline is stripped out, including the last. The returning shell prompt
is on the same line as the output data. To add a final newline, put it in an
END rule:
$ awk 'BEGIN {ORS=""} // { print } END {print "\n"}' sample
Heigh-ho! sing, heigh-ho! unto the green holly:Most friendship\
is feigning, most loving mere folly:Then, heigh-ho, the holly!
$
|
The NF variable contains the number of fields in the
current record. Using NF references its numeric
value, while using $NF references the contents of
the actual field itself. So if a record has 100 fields, print NF
outputs the integer 100, while print $100 outputs
the same thing as print $NF -- the contents of the
last field in the record.
The NR variable, in turn, contains the number of the
current record. When the first record is being read, its value is 1; when the
second record is being read, it increments to 2, and so on. Use it in an
END pattern to output the number of lines in the
input:
$ awk 'END { print "Input contains " NR " lines." }' sample
Input contains 3 lines.
$
|
Note: If the print statement above had been
placed in a BEGIN pattern, the program would have
reported that its input contained 0 lines, because the value for NR
at the time of execution would be 0, as no records of input would have been
read yet.
Use $NR to print the field relative to the current
record number:
$ awk '{ print NR, $NR }' sample
1 Heigh-ho!
2 friendship
3 the
$
|
Take a look again at Figure 4, and see the values for Field 1 in the first record, Field 2 in the second record, and Field 3 in the last record. Compare this with your program output.
Try listing the number of fields for each record and the value of the last field:
$ awk ' { print "Record " NR " has " NF " fields and ends with " $NF}' sample
Record 1 has 7 fields and ends with holly:
Record 2 has 8 fields and ends with folly:
Record 3 has 4 fields and ends with holly!
|
There are a handful of special GAWK variables that are frequently used. Table 2 lists them and describes their meaning.
Table 2. Common GAWK variables
| Variable | Description |
|---|---|
NF
| This variable contains the number of fields per record. |
NR
| This variable contains the number of the current record. |
FS
| This variable is the field separator. |
RS
| This variable is the record separator. |
OFS
| This variable is the output field separator. |
ORS
| This variable is the output record separator. |
FILENAME
| This variable contains the name of input file being read. |
IGNORECASE
| When IGNORECASE is set to a
non-null value, GAWK ignores case in pattern matching. |



