REGEX

REGEX returns a FIXED BIN(31) that indicates the success of matching a specified regular expression or pattern against a string.

Read syntax diagramSkip visual syntax diagram
>>-REGEX(i,j,p,x-+-----------+-)-------------------------------><
                 '-,n-+----+-'     
                      '-,c-'       

i
A reference. i must be ASSIGNABLE. If a match for the pattern is found, it will be assigned the index of the substring in x of the first match for the regular expression p. i must be REAL FIXED BIN with scale factor 0.
j
A reference. j must be ASSIGNABLE. If a match for the pattern is found, it will be assigned the length of the substring in x of the first match for the regular expression p. j must be REAL FIXED BIN with scale factor 0.
p
A string holding a regular expression. The pattern p must have CHARACTER type.

The pattern p must conform to the POSIX standard for Extended Regular Expressions (EREs) (and not to the POSIX standard for Basic Regular Expressions). Wikipedia and other web sites contain good descriptions of regular expressions.

x
A string. x is to be searched for a match with the regular expression p. The string x must have CHARACTER type.
n
An expression. n specifies the location within x at which to begin searching. n must have a computational type and is converted to FIXED BINARY(31,0). If omitted, it defaults to 1.
c
A restricted expression. c specifies the code page of p and x. If omitted, it defaults to the value in the CODEPAGE compiler option. If not omitted, a value for n must be specified.

The code page must have a computational type and is converted to FIXED BINARY (31,0). The code page must specify a valid, supported code page.

The characters [, ], {, }, |, ^, and $ occur often in regular expressions and have varying code points in different encoded character sets. The (implicit or explicit) code page value must correctly match the code page of p and x. If not, the pattern might be deemed to be invalid or a match might not be found.

The processing of the REGEX built-in function proceeds in these steps:
  1. If n is less than 1 or if n is greater than 1 + length(x), the STRINGRANGE condition will be raised if enabled, and REGEX will return the value 1.
  2. If there is no locale matching the code page c, then REGEX will return the value -1.
  3. If the string p does not specify a valid regular expression, then REGEX will return a value greater than 1.
  4. If there is no match in the string x for the regular expression p, then REGEX will return the value 1 and set the index i and the length j to 0. Otherwise, REGEX will return the value 0 and set the index i and the length j corresponding to the substring in x that is the first match for the regular expression p.

The search for a match to the regular expression is case sensitive.

Examples

If p = "All(a|e)n" and x = "12Allan3Allen4Alan5Allan678", then
  • regex( i, j, p, x ) will return 0 and set i to 3 and j to 5 (because it has found the match for the first "Allan").
  • regex( i, j, p, x, 4 ) will return 0 and set i to 9 and j to 5 (because it has found the match for "Allen").
  • regex( i, j, p, x, 10 ) will return 0 and set i to 20 and j to 5 (because it has found the match for the second "Allan").
  • regex( i, j, p, x , 21 ) will return 1 (because there are no more matches).
The preceding set of matches could also have been found via the following loop, which uses the optional fifth parameter to walk through the string x
n = 1;
do loop;
  rc = regex( i, j, p, x, n );
  if rc <> 0 then leave;
  put skip list( substr( x, i, j ) );
  n = i + j;
end;
If p = "[hc]+at" and x = "the cat in the hat", then regex( i, j, p, x, n ) will find the match for "cat" or "hat" depending on the value of n. But, if p = "63"x || "hc" || "fc"x || "+at", then although under codepage 1141, this pattern would display as "[hc]+at".
  • Under the default code page 1140, regex( i, j, p, x, n ) would find no match, because under code page 1140 the hex values for [ and ] are "ba"x and "bb"x respectively.
  • However, regex( i, j, p, x, n, 1141) would find the match for "cat" or "hat" depending on the value of n.





Published: 23 December 2018