Using regular expressions

A regular expression is a pattern which is used to match characters in a string.

There are many excellent online resources which explain the syntax rules of regular expressions. The following are examples of some of the most common:

A character string, for example “Account” will look for the characters "Account". By default searches are case sensitive.

[A-Z] Look for one uppercase letter.
[A-Z]{3} Look for three consecutive uppercase letters.
[0-9]{5} Look for five consecutive digits.
[0-9]+ Look for one or more digits.
[^a-z] Look for everything except lowercase a to z.
\s (Lowercase s) Look for one whitespace character (space, tab, etc).
\S (Uppercase S) Look for any character not whitespace.

The PDF indexer can use a regular expression in the TRIGGER and FIELD parameter. In the TRIGGER, the regular expression specifies the pattern for which to search; in the FIELD, the regular expression is applied to the characters which have been extracted from the field in a way similar to using a mask.

Here is an example:

TRIGGER1=UL(1.00,3.89),LR(2.52,4.17),*,REGEX='PAGE 1'
TRIGGER2=UL(1.02,4.60),LR(2.11,4.95),0,REGEX='[0-9]{5} [a-z]{4}'
FIELD1=UL(1.44,0.00),LR(2.75,0.30),0,(TRIGGER=2,BASE=TRIGGER,
REGEX='[A-Z]+ [A-Z] [A-Z]+')
INDEX1='Name',FIELD1,(TYPE=GROUP)

In this example TRIGGER1 uses a regular expression specified as an ordinary text string. TRIGGER2 uses a regular expression which specifies a pattern of five digits, followed by a space, followed by four lowercase letters. The text "12345 acct" would match the pattern.

FIELD1 uses a regular expression, which specifies one or more uppercase letters, followed by a space, followed by a single uppercase letter, followed by a space, followed by one or more uppercase letters. The characters "MARY R SMITH", "W A DOE", or "LARRY G W" would match this regular expression.