Regular expression syntax

Table 1 describes the syntax of the regular expression tokens supported by Netcool/SSM.

Table 1. Regular expression syntax

Token

Matches

.

Any character.

^

The start of a line (a zero-length string).

$

The end of a line; a new line or the end of the search buffer.

\<

The start of a word (where a word is a string of alphanumeric characters).

\>

The end of a word (the zero length string between an alphanumeric character and a non-alphanumeric character).

\b

Any word boundary (this is equivalent to (\<¦\>) ).

\d

A digit character.

\D

Any non-digit character.

\w

A word character (alphanumeric or underscore).

\W

Any character that is not a word character (alphanumeric or underscore).

\s

A whitespace character.

\S

Any non-whitespace character.

\c

Special characters and escaping. The following characters are interpreted according to the C language conventions: \0, \a, \f, \n, \r, \t, \v. To specify a character in hexadecimal, use the \xNN syntax. For example, \x41 is the ASCII character A.

\

All characters apart from those described above may be escaped using the backslash prefix. For example, to specify a plain left-bracket use \[.

[]

Any one of the specified characters in a set. An explicit set of characters may be specified as in [aeiou] as well as character ranges, such as [0-9A-Fa-f], which match any hexadecimal digit. The dash (-) loses its special meaning when escaped, such as in [A\-Z] or when it is the first or last character in a set, such as in [-xyz0-9].

All of the above backslash-escaping rules may be used within []. For example, the expression [\x41-\x45] is equivalent to [A-D] in ASCII. To use a closing bracket in a set, either escape it using [\]] or use it as the first character in the set, such as []xyz].

POSIX-style character classes are also allowed inside a character set. The syntax for character classes is [:class:]. The supported character classes are:

  • [:alnum:] - alphanumeric characters.
  • [:alpha:] - alphabetic characters.
  • [:blank:] - space and TAB characters.
  • [:cntrl:] - control characters.
  • [:digit:] - numeric characters.
  • [:graph:] - characters that are both printable and visible.
  • [:lower:] - lowercase alphabetic characters.
  • [:print:] - printable characters (characters that are not control characters).
  • [:punct:] - punctuation characters (characters that are not letters, digits, control characters, or spaces).
  • [:space:] - space characters (such as space, TAB and form feed).
  • [:upper:] - uppercase alphabetic characters.
  • [:xdigit:] - characters that are hexadecimal digits.

Brackets are permitted within the set's brackets. For example, [a-z0-9!] is equivalent to [[:lower:][:digit:]!] in the C locale.

[^]

Inverts the behavior of a character set [] as described above. For example, [^[:alpha:]] matches any character that is not alphabetical. The ^ caret symbol only has this special meaning when it is the first character in a bracket set.

{n}

Exactly n occurrences of the previous expression, where 0 <= n <= 255. For example, a{3} matches aaa.

{n,m}

Between n and m occurrences of the previous expression, where 0 <= n <= m <= 255. For example, a 32-bit hexadecimal number can be described as 0x[[:xdigit:]]{1,8}.

{n,}

At least n or more (up to infinity) occurrences of the previous expression.

*

Zero or more of the previous expression.

+

One or more of the previous expression.

?

Zero or one of the previous expression.

(exp)

Grouping; any series of expressions may be grouped in parentheses so as to apply a postfix or bar (¦) operator to a group of successive expressions. For example:

  • ab+ matches all of abbb
  • (ab)+ matches all of ababab

¦

Alternate expressions (logical OR). The vertical bar (¦) has the lowest precedence of all tokens in the regular expression language. This means that ab¦cd matches all of cd but does not match abd (in this case use a(b¦c)d ).

Tip: When defining regular expressions to match multi-byte characters, enclose each multi-byte character in parentheses ().