Message Sets: Regular expression syntax

Regular expression syntax elements and example syntax rules.

A regular expression allows you to specify the conditions that a string must satisfy. For example, you might use a regular expression to specify that a string must contain eight characters and start with an alphabetic character. Use the syntax in the following tables to write regular expressions to specify the sets of strings that are permitted. A regular expression can be made up of one or more branches (choices), each of which can be a string made up of characters, character classes, or parenthesized expressions with modifiers to specify repetition rules.

The regular expression syntax that is supported is a subset of XML Schema regular expressions, with the addition of the \xNN hexadecimal syntax. For the full syntax, see Appendix F in XML Schema Part 2: Datatypes that can be found on the World Wide Web Consortium (W3C) Web site.

The following table lists the supported regular expression syntax elements:

Metacharacter Meaning
\ escape
. any single character
* preceding character 0 or more times
+ preceding character 1 or more times
? preceding character 0 or 1 time
{...} occurrences of preceding 1
[...] match one of the class contained
[^...] match one of the class not contained 1
(...) group the expressions 1
| match either preceding or following
Escape sequence Meaning
\n new line
\r carriage return
\t tab
\e escape
Class code Meaning
\d digit [0-9]
\D non-digit [^0-9] 2
\s white space[ \t\n\r]
\S non-whitespace character[^ \t\n\r] 2
\p{L} all letters 3
\p{N} all numbers, similar to \d 4
[\p{N}\p{L}] all numbers and all letters, similar to \w 4
\P{L} not letters, equivalent to [^\p{L}]
\P{N} not numbers, equivalent to [^\p{N}]
\xNN hexadecimal digits in the range 0 to F (\x00 not supported)
Range Meaning
{n} exactly n times
{n,} at least n times
{n,m} at least n, but no more than m, times
{0,m) zero to m times
Notes:
  1. The ellipsis (...) is used to indicate anything inside the { }, or [ ], or ( ) characters.
  2. The caret (^) means "not" when inside the [ ] characters.
  3. Consult Appendix F of the document XML Schema Part 2: Datatypes for other characters that can be used in place of L and N.
  4. Consult Appendix F of the document XML Schema Part 2: Datatypes for the precise differences.

The following table gives some examples of the syntax rules for regular expression syntax. See Message Sets: Using regular expressions to parse data elements for some examples of their use.

Regular expression data pattern Meaning
a Match character "a"
. Match any one character
a+ Match a string of one or more "a"
a* Match a string of zero or more "a"
a? Match zero or one "a"
a{3} Match a string of exactly three "a", that is "aaa"
a{3,} Match a string of three or more "a"
a{2,4} Match a string with a minimum of two and a maximum of four occurrences of "a"
[abc] Match any one of the characters "a", "b", or "c"
[a-zA-Z] Match any one character in the range "a" to "z", or in the range "A" to "Z". Note that the range of characters matched is based on the Unicodes of the characters specified.
[^abc] Match any character except one of "a", "b", or "c"
(ab)+ Match one or more repetitions of the string "ab"
(ab)|(cd) Match either of the strings "ab" or "cd"