A
character class is written as a number of characters
inside square brackets, as in:
[0123456789]
This is a regular expression
that stands for any one of the characters inside the brackets. This character
class stands for any digit character.
[0123456789][0123456789][0123456789]
stands for any three digits in a row.
The digit character class can be written more simply as:
[0-9]
The
- stands for all the characters that
come between the two characters on either side. Thus:
[a-z]
stands
for all characters between
a and
z, whereas:
[a-zA-Z]
stands for all characters
in both the range
a to
z and
the range
A to
Z.
Note: - is not treated as a range
indicator when it appears at the beginning or end of a character class.
If the first character after the
[ is a circumflex
(
^), the character class stands for all characters that
are
not listed in the brackets. For example:
[^0-9]
stands for all characters that are
not digits. Similarly:
[^a-zA-Z0-9]
stands for
all characters that are not alphabetic or numeric.
There is a special character class—written as
. —that matches
any character except newline. The pattern:
“p.x”
matches any 3-character sequence starting with
p and ending with
x.
Note: A newline
is never matched except when explicitly specified as \n,
or in a range. In particular, a . never matches newline.
New character class symbols have been introduced by POSIX. These are provided
as special sequences that are valid only within character class definitions.
The sequences are:
[.coll.]" collation of character coll
[=equiv=] collation of the character class equiv
[:char-class:] any of the characters from char-class
lex accepts only the POSIX locale for these definitions. In particular,
multicharacter collation symbols are not supported. You can still use, for
example, the character class:
[[.a.]-[.z.]]
which is equivalent to:
[a-z]
for the POSIX locale.
lex accepts the following POSIX-defined character classes:
[:alnum:] [:cntrl:] [:lower:] [:space:]
[:alpha:] [:digit:] [:print:] [:upper:]
[:blank:] [:graph:] [:punct:] [:xdigit:]
It is more portable (and
more obvious) to use the new expressions.