Using regular expressions

A regular expression is a pattern that is used to match characters in a string. There are many online resources that explain the syntax rules of regular expressions.

Regular expression examples

The following examples show some common regular expressions:
Table 1. Common regular expressions
Regular expression Results
Account Finds the characters "Account." By default searches are case sensitive.
[A-Z] Finds one uppercase letter.
[A-Z]{3} Finds three consecutive uppercase letters.
[0-9]{5} Finds five consecutive digits.
[0-9]+ Finds one or more digits.
[^a-z] Finds everything except lowercase a to z.
\s Finds one whitespace character (space, tab, and so on).
\S Finds any character except for whitespace.

The 400 indexer can use a regular expression in the TRIGGER and FIELD parameter. In the TRIGGER, the regular expression specifies the pattern for which to search; in the FIELD, the regular expression is applied to the characters which have been extracted from the field in a way similar to using a mask.

The regular expression must be specified in the code page given by the CPGID parameter. The regular expression can be specified as text, for example:
CPGID=37
TRIGGER1=*,*,'PAGE',(TYPE=GROUP)
TRIGGER2=*,25,REGEX='[A-Z]{3}-[A-Z]{6}',(TYPE=FLOAT)
FIELD1=0,9,2,(TRIGGER=1,BASE=TRIGGER)
FIELD2=0,38,10,(TRIGGER=2,BASE=0,REGEX='[A-Z] [0-9]{3}-\S+')
INDEX1='Page',FIELD1,(TYPE=GROUP,BREAK=YES)
INDEX2='Sub-Source',FIELD2

In this example TRIGGER2 uses a regular expression, which specifies a pattern of three uppercase letters, followed by a hyphen, followed by six uppercase letters. The text "SUB-SOURCE" would match the pattern.

FIELD2 uses a regular expression, which specifies one uppercase letter, followed by a space, followed by three numbers, followed by a hyphen, followed by one or more non white space characters. The characters "Q 010-1", "I 000-RS", or "L 133-1B" would match this regular expression.

The regular expression can also be specified in hexadecimal in the code page given by the CPGID parameter, for example:
CPGID=500
TRIGGER1=*,1,REGEX=X'4AF060F95AC0F3D0'  /* [0-9]{3} */

Performance

All text to which the regular expression is applied is converted to UTF-16.

  • Performance might not be as fast when you use a regular expression. Using a text string can be faster.