Character Class
A character class is used to represent a set of characters. The following combinations are allowed in describing a character class:
x: (where x is not one of the magic characters ^$()%.[]*+-?) represents the character x itself.
.: (a dot) represents all characters.
%a: represents all letters.
%c: represents all control characters.
%d: represents all digits.
%l: represents all lowercase letters.
%p: represents all punctuation characters.
%s: represents all space characters.
%u: represents all uppercase letters.
%w: represents all alphanumeric characters.
%x: represents all hexadecimal digits.
%z: represents the character with representation 0.
%x: (where x is any non-alphanumeric character) represents the character x. This is the standard way to escape the magic characters. Any punctuation character (even the non magic) can be preceded by a '%' when used to represent itself in a pattern.
[set]
: represents the class which is the union of all characters in set. A range of characters can be specified by separating the end characters of the range with a '-
'. All classes %
x described above can also be used as components in set. All other characters in set represent themselves. For example, [%w_]
(or [_%w]
) represents all alphanumeric characters plus the underscore, [0-7]
represents the octal digits, and [0-7%l%-]
represents the octal digits plus the lowercase letters plus the '-
' character.
The interaction between ranges and classes is not defined. Therefore, patterns like [%a-z]
or [a-%%]
have no meaning.
[^set]
: represents the complement of set, where set is interpreted as above.
For all classes represented by single letters (%a
, %c
, etc.), the corresponding uppercase letter represents the complement of the class. For instance, %S
represents all non-space characters.
The definitions of letter, space, and other character groups depend on the current locale. In particular, the class [a-z]
may not be equivalent to %l
.