REGEX returns a FIXED BIN(31) that indicates the success
of matching a specified regular expression or pattern against a string.
>>-REGEX(i,j,p,x-+-----------+-)-------------------------------><
'-,n-+----+-'
'-,c-'
- i
- A reference. i must be ASSIGNABLE. If
a match for the pattern is found, it will be assigned the index of
the substring in x of the first match for the regular expression p. i must
be REAL FIXED BIN with scale factor 0.
- j
- A reference. j must be ASSIGNABLE. If a match for the pattern
is found, it will be assigned the length of the substring in x of
the first match for the regular expression p. j must
be REAL FIXED BIN with scale factor 0.
- p
- A string holding a regular expression. The pattern p must
have CHARACTER type.
The pattern p must conform to the POSIX
standard for Extended Regular Expressions (EREs) (and not to the POSIX
standard for Basic Regular Expressions). Wikipedia and other web sites
contain good descriptions of regular expressions.
- x
- A string. x is to be searched for a match with the regular
expression p. The string x must have CHARACTER type.
- n
- An expression. n specifies the location within x at
which to begin searching. n must have a computational type
and is converted to FIXED BINARY(31,0). If omitted, it defaults to
1.
- c
- A restricted expression. c specifies the code page of p and x.
If omitted, it defaults to the value in the CODEPAGE compiler option.
If not omitted, a value for n must be specified.
The code
page must have a computational type and is converted to FIXED BINARY
(31,0). The code page must specify a valid, supported code page.
The characters [, ], {, }, |, ^, and $ occur often in regular expressions
and have varying code points in different encoded character sets.
The (implicit or explicit) code page value must correctly match the
code page of p and x. If not, the pattern might be deemed
to be invalid or a match might not be found.
The processing of the REGEX built-in function proceeds in these
steps:
- If n is less than 1 or if n is greater than 1 +
length(x), the STRINGRANGE condition will be raised if enabled, and
REGEX will return the value 1.
- If there is no locale matching the code page c, then REGEX
will return the value -1.
- If the string p does not specify a valid regular expression,
then REGEX will return a value greater than 1.
- If there is no match in the string x for the regular expression p,
then REGEX will return the value 1 and set the index i and
the length j to 0. Otherwise, REGEX will return the value 0
and set the index i and the length j corresponding
to the substring in x that is the first match for the regular
expression p.
The search for a match to the regular expression is case sensitive.
Examples
If
p = "All(a|e)n" and
x =
"12Allan3Allen4Alan5Allan678", then
- regex( i, j, p, x ) will return 0 and set i to
3 and j to 5 (because it has found the match for the first
"Allan").
- regex( i, j, p, x, 4 ) will return 0 and set i to
9 and j to 5 (because it has found the match for "Allen").
- regex( i, j, p, x, 10 ) will return 0 and set i to
20 and j to 5 (because it has found the match for the second
"Allan").
- regex( i, j, p, x , 21 ) will return 1 (because
there are no more matches).
The preceding set of matches could also have been found
via the following loop, which uses the optional fifth parameter to
walk through the string
xn = 1;
do loop;
rc = regex( i, j, p, x, n );
if rc <> 0 then leave;
put skip list( substr( x, i, j ) );
n = i + j;
end;
If
p = "[hc]+at" and
x = "the
cat in the hat", then
regex( i, j, p, x, n ) will
find the match for "cat" or "hat" depending on the value of
n.
But, if
p = "63"x || "hc" || "fc"x || "+at", then although
under codepage 1141, this pattern would display as "[hc]+at".
- Under the default code page 1140, regex( i, j, p, x, n
) would find no match, because under code page 1140 the hex
values for [ and ] are "ba"x and "bb"x respectively.
- However, regex( i, j, p, x, n, 1141) would find
the match for "cat" or "hat" depending on the value of n.