REGEX

REGEX returns a FIXED BIN(31) that indicates the success of matching a specified regular expression or pattern against a string.

Read syntax diagramSkip visual syntax diagramREGEX( i, j, p, x, n, c)
i
A reference. i must be ASSIGNABLE. If a match for the pattern is found, it will be assigned the index of the substring in x of the first match for the regular expression p. i must be REAL FIXED BIN with scale factor 0. i must be either a scalar or a one-dimensional array of scalars.
j
A reference. j must be ASSIGNABLE. If a match for the pattern is found, it will be assigned the length of the substring in x of the first match for the regular expression p. j must be REAL FIXED BIN with scale factor 0. j must be either a scalar or a one-dimensional array of scalars.
p
A string holding a regular expression. The pattern p must have CHARACTER type.

The pattern p must conform to the POSIX standard for Extended Regular Expressions (EREs) (and not to the POSIX standard for Basic Regular Expressions). Wikipedia and other web sites contain good descriptions of regular expressions.

x
A string. x is to be searched for a match with the regular expression p. The string x must have CHARACTER type.
n
An expression. n specifies the location within x at which to begin searching. n must have a computational type and is converted to FIXED BINARY(31,0). If omitted, it defaults to 1.
c
A restricted expression. c specifies the code page of p and x. If omitted, it defaults to the value in the CODEPAGE compiler option. If not omitted, a value for n must be specified.

The code page must have a computational type and is converted to FIXED BINARY (31,0). The code page must specify a valid, supported code page.

If either i or j is an array, then
  • both must be arrays with matching bounds and NATIVE type size_t
  • the first elements of each array will be assigned the index and length of the matching expression (if any).
  • the second and subsequent elements of each array will be assigned the index and length of the corresponding matching subexpression (if any).

The characters [, ], {, }, |, ^, and $ occur often in regular expressions and have varying code points in different encoded character sets. The (implicit or explicit) code page value must correctly match the code page of p and x. If not, the pattern might be deemed to be invalid or a match might not be found.

The processing of the REGEX built-in function proceeds in these steps:
  1. If n is less than 1 or if n is greater than 1 + length(x), the STRINGRANGE condition will be raised if enabled, and REGEX will return the value 1.
  2. If there is no locale matching the code page c, then REGEX will return the value -1.
  3. If the string p does not specify a valid regular expression, then REGEX will return a value greater than 1.
  4. If there is no match in the string x for the regular expression p, then REGEX will return the value 1 and set the index i and the length j to 0. Otherwise, REGEX will return the value 0 and set the index i and the length j corresponding to the substring in x that is the first match for the regular expression p.

The search for a match to the regular expression is case sensitive.

Examples

Example 1

If p = "All(a|e)n" and x = "12Allan3Allen4Alan5Allan678", then
  • regex( i, j, p, x ) will return 0 and set i to 3 and j to 5 (because it has found the match for the first "Allan").
  • regex( i, j, p, x, 4 ) will return 0 and set i to 9 and j to 5 (because it has found the match for "Allen").
  • regex( i, j, p, x, 10 ) will return 0 and set i to 20 and j to 5 (because it has found the match for the second "Allan").
  • regex( i, j, p, x , 21 ) will return 1 (because there are no more matches).
The preceding set of matches could also have been found via the following loop, which uses the optional fifth parameter to walk through the string x
n = 1;
do loop;
  rc = regex( i, j, p, x, n );
  if rc <> 0 then leave;
  put skip list( substr( x, i, j ) );
  n = i + j;
end;

Example 2

If p = "[hc]+at" and x = "the cat in the hat", then regex( i, j, p, x, n ) will find the match for "cat" or "hat" depending on the value of n. But, if p = "63"x || "hc" || "fc"x || "+at", then although under codepage 1141, this pattern would display as "[hc]+at".
  • Under the default code page 1140, regex( i, j, p, x, n ) would find no match, because under code page 1140 the hex values for [ and ] are "ba"x and "bb"x respectively.
  • However, regex( i, j, p, x, n, 1141) would find the match for "cat" or "hat" depending on the value of n.

Example 3

Given the following:


  pattern = '([a-zA-Z]+) * ([a-zA-Z]+) * ((([a-zA-Z1-9]+)\.){0,1}([a-zA-Z1-9]+))';
  string = ' CREATE DATABASE TESTDB;';
  rc = regex( a_index, a_length, pattern, string );

Then


  a_index(2) and a_length(2) will give the index and length for CREATE
  a_index(3) and a_length(3) will give the index and length for DATABASE
  a_index(4) and a_length(4) will give the index and length for TESTDB