Using regular expressions
A regular expression is a pattern that is used to match characters in a string. There are many online resources that explain the syntax rules of regular expressions.
Regular expression examples
Regular expression | Results |
---|---|
Account | Finds the characters "Account." By default searches are case sensitive. |
[A-Z] | Finds one uppercase letter. |
[A-Z]{3} | Finds three consecutive uppercase letters. |
[0-9]{5} | Finds five consecutive digits. |
[0-9]+ | Finds one or more digits. |
[^a-z] | Finds everything except lowercase a to z. |
\s | Finds one whitespace character (space, tab, and so on). |
\S | Finds any character except for whitespace. |
The 400 indexer can use a regular expression in the TRIGGER and FIELD parameter. In the TRIGGER, the regular expression specifies the pattern for which to search; in the FIELD, the regular expression is applied to the characters which have been extracted from the field in a way similar to using a mask.
CPGID=37
TRIGGER1=*,*,'PAGE',(TYPE=GROUP)
TRIGGER2=*,25,REGEX='[A-Z]{3}-[A-Z]{6}',(TYPE=FLOAT)
FIELD1=0,9,2,(TRIGGER=1,BASE=TRIGGER)
FIELD2=0,38,10,(TRIGGER=2,BASE=0,REGEX='[A-Z] [0-9]{3}-\S+')
INDEX1='Page',FIELD1,(TYPE=GROUP,BREAK=YES)
INDEX2='Sub-Source',FIELD2
In this example TRIGGER2 uses a regular expression, which specifies a pattern of three uppercase letters, followed by a hyphen, followed by six uppercase letters. The text "SUB-SOURCE" would match the pattern.
FIELD2 uses a regular expression, which specifies one uppercase letter, followed by a space, followed by three numbers, followed by a hyphen, followed by one or more non white space characters. The characters "Q 010-1", "I 000-RS", or "L 133-1B" would match this regular expression.
CPGID=500
TRIGGER1=*,1,REGEX=X'4AF060F95AC0F3D0' /* [0-9]{3} */
Performance
All text to which the regular expression is applied is converted to UTF-16.
- Performance might not be as fast when you use a regular expression. Using a text string can be faster.