Using regular expressions

A regular expression is a pattern that is used to match characters in a string. There are many online resources that explain the syntax rules of regular expressions.

Regular expression examples

Restriction: Regular expressions are not available on z/OS® systems.
The following examples show some common regular expressions:
Table 1. Common regular expressions
Regular expression Results
Account Finds the characters "Account." By default searches are case sensitive.
[A-Z] Finds one uppercase letter.
[A-Z]{3} Finds three consecutive uppercase letters.
[0-9]{5} Finds five consecutive digits.
[0-9]+ Finds one or more digits.
[^a-z] Finds everything except lowercase a to z.
\s Finds one whitespace character (space, tab, and so on).
\S Finds any character except for whitespace.

ACIF can use a regular expression in the TRIGGER and FIELD parameter. In the TRIGGER, the regular expression specifies the pattern for which to search; in the FIELD, the regular expression is applied to the characters which have been extracted from the field in a way similar to using a mask.

The regular expression must be specified in the code page given by the CPGID parameter. If you are running on an ASCII platform and the CPGID of the document is ASCII then the regular expression can be specified as text, for example:
CPGID=819
TRIGGER1=*,*,'PAGE',(TYPE=GROUP)
TRIGGER2=*,25,REGEX='[A-Z]{3}-[A-Z]{6}',(TYPE=FLOAT)
FIELD1=0,9,2,(TRIGGER=1,BASE=TRIGGER)
FIELD2=0,38,10,(TRIGGER=2,BASE=0,REGEX='[A-Z] [0-9]{3}-\S+')
INDEX1='Page',FIELD1,(TYPE=GROUP,BREAK=YES)
INDEX2='Sub-Source',FIELD2

In this example TRIGGER2 uses a regular expression, which specifies a pattern of three uppercase letters, followed by a hyphen, followed by six uppercase letters. The text "SUB-SOURCE" would match the pattern.

FIELD2 uses a regular expression, which specifies one uppercase letter, followed by a space, followed by three numbers, followed by a hyphen, followed by one or more non white space characters. The characters "Q 010-1", "I 000-RS", or "L 133-1B" would match this regular expression.

If you are running on an ASCII platform and the CPGID parameter of the document is not ASCII then the regular expression must be specified in hexadecimal in the code page given by the CPGID parameter, for example:
CPGID=500
TRIGGER1=*,1,REGEX=X'4AF060F95AC0F3D0'  /* [0-9]{3} */

Performance

All text to which the regular expression is applied is converted to UTF-16.

  • Performance might not be as fast when you use a regular expression. Using a text string can be faster.
  • If the CPGID value is incorrect, the conversion might fail with error message APK2080.

If the regular expression is invalid, ACIF will fail with error message APK484.