z/OS UNIX System Services User's Guide
Previous topic | Next topic | Contents | Contact z/OS | Library | PDF


Regular expressions

z/OS UNIX System Services User's Guide
SA23-2279-00

A regular expression is a way of telling awk to select records that contain certain strings of characters. For example, the instruction:
/ri/ { print }
tells awk to print all records that contain the string ri. Regular expressions are always enclosed in slashes, as shown in the instruction just discussed. For a discussion of regular expressions beyond their usage in awk, see Appendix C. Regular Expressions (regexp) in z/OS UNIX System Services Command Reference.
The following characters have special meanings when you use them in regular expressions:
Character
Meaning
^
Stands for the beginning of a field. For example:
$2 ~ /^b/ { print }
Prints any record whose second field begins with b.
$
Stands for the end of a field. For example:
$2 ~ /g$/ { print }
prints any record with a second field that ends with g.
.
Matches any single character (except the newline). For example:
$2 ~ /i.g/ { print }
selects the records with fields containing ing, and also selects the records containing bridge (idg).
|
Means or. For example:
/Linda|Lori/
is a regular expression that matches either of the strings Linda or Lori.
*
Indicates zero or more repetitions of a character. For example:
/ab*c/
matches abc, abbc, abbbc, and so on. It also matches ac (zero repetitions of b). Since . matches any character except the newline, .* matches an arbitrary string of zero or more characters. For example:
$2 ~ /^r.*g$/ { print }
prints any record with a second field that begins with r, ends in g, and has any set of characters between (for example, reading and role playing).
+
Is similar to *, but stands for one or more repetitions of a character. For example:
/ab+c/
matches abc, abbc, and so on, but does not match ac.
\{m,n\}
Indicates m to n repetitions of a character (where m and n are both integers). For example:
/ab\{2,4\}c/
would match abbc, abbbc, and abbbbc, and nothing else.
?
Is similar to *, but stands for zero or one repetitions of a string. For example:
/ab?c/
matches ac and abc, but not abbc, and so on.
[X]
Matches any one of the set of characters X given inside the square brackets. For example:
$1 ~ /^[LJ]/ { print }
prints any record whose first field begins with either L or J. As a special case: [:lower:] inside the square brackets stands for any lowercase letter, [:upper:] inside the square brackets stands for any uppercase letter, [:alpha:] inside the square brackets stands for any letter, and [:digit:] inside the square brackets stands for any digit.
Thus:
/[[:digit:][:alpha:]]/
 
matches a digit or letter.
[^X]
Matches any one character that is not in the set X. For example:
$1 ~ /^[^LJ]/ { print }
prints any record with a first field that does not begin with L or J.
$1 ~ /^[^[:digit:]]/ { print }
 
prints any record with a first field that does not begin with a digit.
(X)
Matches anything that the regular expression X does. You can use parentheses to control how other special characters behave. For example, * normally applies to the single character immediately preceding it. This means that:
/abc*d/
matches abd, abcd, abccd, and so on. However:
/a(bc)*d/
matches ad, abcd, abcbcd, abcbcbcd, and so on.
The characters with special meanings are:
^   $   .   *   +   ?   [   ]   (   )   |
These are known as metacharacters.
When a metacharacter appears in a regular expression, it usually has its special meaning. If you want to use one of these characters literally (without its special meaning), put a backslash in front of the character. For example:
/\$1/ { print }
prints all records that contain a dollar sign $ followed by a 1. If you simply entered:
/$1/ { print }
awk would search for records where the end of the record was followed by a 1, which is impossible.

Because the backslash has this special meaning, \ is also considered a metacharacter. If you want to create a regular expression that matches a backslash, you must therefore use two backslashes \\.

Go to the previous page Go to the next page




Copyright IBM Corporation 1990, 2014