Message Sets: Regular expression syntax
Regular expression syntax elements and example syntax rules.
A regular expression allows you to specify the conditions that a string must satisfy. For example, you might use a regular expression to specify that a string must contain eight characters and start with an alphabetic character. Use the syntax in the following tables to write regular expressions to specify the sets of strings that are permitted. A regular expression can be made up of one or more branches (choices), each of which can be a string made up of characters, character classes, or parenthesized expressions with modifiers to specify repetition rules.
The regular expression syntax that is supported is a subset of XML Schema regular expressions, with the addition of the \xNN hexadecimal syntax. For the full syntax, see Appendix F in XML Schema Part 2: Datatypes that can be found on the World Wide Web Consortium (W3C) Web site.
The following table lists the supported regular expression syntax elements:
Metacharacter | Meaning |
---|---|
\ |
escape |
. |
any single character |
* |
preceding character 0 or more times |
+ |
preceding character 1 or more times |
? |
preceding character 0 or 1 time |
{...} |
occurrences of preceding 1 |
[...] |
match one of the class contained |
[^...] |
match one of the class not contained 1 |
(...) |
group the expressions 1 |
| |
match either preceding or following |
Escape sequence | Meaning |
\n |
new line |
\r |
carriage return |
\t |
tab |
\e |
escape |
Class code | Meaning |
\d |
digit [0-9] |
\D |
non-digit [^0-9] 2 |
\s |
white space[ \t\n\r] |
\S |
non-whitespace character[^ \t\n\r] 2 |
\p{L} |
all letters 3 |
\p{N} |
all numbers, similar to \d 4 |
[\p{N}\p{L}] |
all numbers and all letters, similar to \w 4 |
\P{L} |
not letters, equivalent to [^\p{L}] |
\P{N} |
not numbers, equivalent to [^\p{N}] |
\xNN |
hexadecimal digits in the range 0 to F (\x00 not supported) |
Range | Meaning |
{n} |
exactly n times |
{n,} |
at least n times |
{n,m} |
at least n, but no more than m, times |
{0,m) |
zero to m times |
- The ellipsis (...) is used to indicate anything inside the { }, or [ ], or ( ) characters.
- The caret (^) means "not" when inside the [ ] characters.
- Consult Appendix F of
the document XML Schema
Part 2: Datatypes for
other characters that can be used in place of
L
andN
. - Consult Appendix F of the document XML Schema Part 2: Datatypes for the precise differences.
The following table gives some examples of the syntax rules for regular expression syntax. See Message Sets: Using regular expressions to parse data elements for some examples of their use.
Regular expression data pattern | Meaning |
---|---|
a |
Match character "a" |
. |
Match any one character |
a+ |
Match a string of one or more "a" |
a* |
Match a string of zero or more "a" |
a? |
Match zero or one "a" |
a{3} |
Match a string of exactly three "a", that is "aaa" |
a{3,} |
Match a string of three or more "a" |
a{2,4} |
Match a string with a minimum of two and a maximum
of four occurrences of "a" |
[abc] |
Match any one of the characters "a" , "b" ,
or "c" |
[a-zA-Z] |
Match any one character in the range "a" to "z" ,
or in the range "A" to "Z". Note
that the range of characters matched is based on the Unicodes of the
characters specified. |
[^abc] |
Match any character except one of "a" , "b" ,
or "c" |
(ab)+ |
Match one or more repetitions of the string "ab" |
(ab)|(cd) |
Match either of the strings "ab" or "cd" |