Regular Expressions

Regular expressions can be used to search traffic for complex patterns in the data.

The IBM Guardium implementation of regular expressions conforms with POSIX 1003.2. For more detailed information, see the Open Group web site: www.opengroup.org. Regular expressions can be used to search traffic for complex patterns in the data. See Policies for examples.

This help topic provides instructions for using the Build Regular Expression Tool, and several tables of commonly used special characters and constructs. It does not provide a comprehensive description of how regular expressions are constructed or used. See the Open Group web site for more detailed information.

The important point to keep in mind about pattern matching or XML matching using regular expressions, is that the search for a match starts at the beginning of a string and stops when the first sequence matching the expression is found. Different or the same regular expressions can be used for pattern matching and XML matching at the same time.

Note: IBM Guardium does not support regular expressions for non-English languages.

Using the Build Regular Expression Tool

When an input field requires a regular expression, you can use the Build Regular Expression tool to code and test a regular expression. The Build Regular Expression icon is located in Policy Builder under Add Rule.

To open the Build Regular Expression tool, click the regex icon icon next to the field that will contain the regular expression. If you have already entered anything in the field, it will be copied to the Regular Expression box in the Build Regular Expression panel.

  1. Select a category of regular expressions from the drop-down list.
  2. Select a pattern from the drop-down list.
  3. Enter or modify the expression in the Regular Expression box.
  4. To test the expression, enter text in the Text To Match Against box, and then click the Test button:
    • If the expression contains an error (a missing closing brace, for example), you will be informed with a Syntax Error message.
    • The Match Found message indicates that your regular expression has found a match in the text that you have entered.
    • If no match is found, the No Match Found message is displayed.
  5. We suggest that you repeat the step a number of times to verify that your regular expression both matches and does not match, as expected for your purpose.
  6. To enter a special character at the end of your expression, you can select it from the Select element list. To enter a special character anywhere else, you must type it or copy it there.
  7. When you are done making changes and testing, click Accept to close the Build Regular Expression panel and copy the regular expression to the definition panel.

Special Characters and Constructs

The following table provides a summary of the more commonly used special characters and constructs.

Table 1. Special Characters and Constructs
Character How do I do ... Example Matches No Match
literal Match an exact sequence of characters (case sensitive), except for the special characters described below can can Can cab caN
. (dot) Match any character including carriage return or newline (\n) characters ca. can cab c cb
* Match zero or more instances of preceding character(s) Ca*n Cn Can Caan Cb Cabn
^ Match string beginning with following character(s) ^C. Ca ca a
$ Match string ending with preceding character(s) C.n$ Can Cn Cab
+ Match one or more instances of preceding character(s) ^Ca+n Can Caan Cn
? Match either zero or one instance of preceding character(s) Ca?n Cn Can Caan
| Match either the preceding or following pattern Can|cab Can cab Cab
(x ...) Match the sequence enclosed in parentheses (Ca)*n Can XaCan Cn CCnn
{n} Match exactly n instances of the preceding character(s) Ca{3}n Caaan Caan Caaaan
{n,} Match n or more instances of the preceding character(s) Ca{2,}n Caan Caaaan Can Cn
{n,m} Match from n to m instances of the preceding character(s) Ca{2,3}n Caan Caaan Can Caaaan
[a-ce] Match a single character in the set, where the dash indicates a contiguous sequence; for example, [0-9] matches any digit [C-FL]an Can Dan Lan Ban
[^a-ce] Match any character that is NOT in the specified set [^C-FL]an aan Ban Can Dan
[[.char.]] Match the enclosed character or the named character from the Named Characters Table [[.~.]]an or [[.tilde.]]an ~an @an
[[:class:]] Match any character in the specified character class, from the Character Classes Table [[:alpha:]]+ abc ab3

Named Characters Table (English)

The following table describes the standard character names that can be used within regular expression bracket pairs ([[.char]]). Character names are location specific, so non-English versions of Guardium® may use a different set of character names.

  • NUL \0
  • SOH \001
  • STX \002
  • ETX \003
  • EOT \004
  • ENQ \005
  • ACK \006
  • BEL \007
  • alert \007
  • BS \010
  • backspace \b
  • HT \011
  • tab \t
  • LF \012
  • newline \n
  • VT \013
  • vertical-tab \v
  • FF \014
  • form-feed \f
  • CR \015
  • carriage-return \r
  • SO \016
  • SI \017
  • DLE \020
  • DC1 \021
  • DC2 \022
  • DC3 \023
  • DC4 \024
  • NAK \025
  • SYN \026
  • ETB \027
  • CAN \030
  • EM \031
  • SUB \032
  • ESC \033
  • IS4 \034
  • FS \034
  • IS3 \035
  • GS \035
  • IS2 \036
  • RS \036
  • IS1 \037
  • US \037
  • space ' '
  • exclamation-mark !
  • quotation-mark "
  • number-sign #
  • dollar-sign $
  • percent-sign %
  • ampersand &
  • apostrophe \'
  • left-parenthesis (
  • right-parenthesis )
  • asterisk *
  • plus-sign +
  • comma ,
  • hyphen -
  • period .
  • full-stop .
  • slash /
  • solidus /
  • zero 0
  • one 1
  • two 2
  • three 3
  • four 4
  • five 5
  • six 6
  • seven 7
  • eight 8
  • nine 9
  • colon :
  • semicolon ;
  • less-than-sign <
  • equals-sign =
  • greater-than-sign >
  • question-mark ?
  • commercial-at @
  • left-square-bracket [
  • right-square-bracket ]
  • backslash \
  • reverse-solidus \\
  • circumflex ^
  • circumflex-accent ^
  • underscore _
  • low-line _
  • grave-accent `
  • left-brace {
  • left-curly-bracket {
  • right-brace }
  • right-curly-bracket
  • vertical-line |
  • tilde ~
  • DEL 177
  • NULL 0

Named Character Class Table (English)

The following table describes the standard character classes that you can reference within regular expression bracket pairs ([[:class:]]). Note that character classes are location specific, so non-English versions of Guardium may use a different set of character names.

  • alnum - Alphanumeric (a-z, A-Z, 0-9)
  • alpha - Alphabetic (a-z, A-Z)
  • blank - Whitespace (blank, line feed, carriage return)
  • cntrl - Control
  • digit - 0-9
  • graph - Graphics
  • lower - Lowercase alphabetic (a-z)
  • print - Printable characters
  • punct - Punctuation characters
  • space - Space, tab, newline, and carriage return
  • upper - Uppercase alphabetic
  • xdigit - Hexadecimal digit (0-9, a-f)

Regular Expression Examples

You can copy and paste any of the expressions into a field requiring a regular expression. When using any of these examples, we strongly suggest that you experiment by using it in the Build Regular Expression tool, entering a variety of matching and non-matching values, so that you understand exactly what is being matched by the expression.

Regular Expression Examples

Social Security Number (must have hyphens) [0-9]{3}-[0-9]{2}-[0-9]{4}

Phone Number (North America - Matches 3334445555, 333.444.5555, 333-444-5555, 333 444 5555, (333) 444 5555, and all combinations thereof) \(?[0-9]{3}\)?[-. ]?[0-9]{3}[-. ]?[0-9]{4}

Postal Code - (Canada) [ABCEGHJKLMNPRSTVXY][0-9][A-Z] [0-9][A-Z][0-9]

Postal Code - (UK) [A-Z]{1,2}[0-9][A-Z0-9]? [0-9][ABD-HJLNP-UW-Z]{2}

Zip Code (US) (5 digits required, hyphen followed by four digits optional) [0-9]{5}(?:-[0-9]{4})?

Credit Card Numbers [0-9]{4}[-, ]?[0-9]{4}[-, ]?[0-9]{4}[-, ]?[0-9]{4}