REGEXP_INSTR scalar function

The REGEXP_INSTR scalar function returns the starting or ending position of the matched substring, depending on the value of the return_option argument.

Read syntax diagramSkip visual syntax diagramREGEXP_INSTR(source-string ,pattern-expression ,start,occurrence,return_option,flags,group ,CODEUNITS32,CODEUNITS16OCTETS )

The schema is SYSIBM.

source-string
An expression that specifies the string in which the search is to take place. This expression must return a built-in character string, graphic string, numeric value, or datetime value. A numeric or datetime value is implicitly cast to VARCHAR before the function is evaluated. A character string cannot specify the FOR BIT DATA attribute (SQLSTATE 42815).
pattern-expression
An expression that specifies the regular expression string that is the pattern for the search. This expression must return a built-in character string, graphic string, numeric value, or datetime value. A numeric or datetime value is implicitly cast to VARCHAR before the function is evaluated. The length of a CLOB or DBCLOB expression must not be greater than the maximum length of a VARCHAR or VARGRAPHIC data type. A character string cannot specify the FOR BIT DATA attribute (SQLSTATE 42815).
start
An expression that specifies the position within source-string at which the search is to start. The expression must return a built-in character string, graphic string, or numeric value. If the value is not of type INTEGER, it is implicitly cast to INTEGER before the function is evaluated. The value of the integer must be greater than or equal to 1. If OCTETS is specified and the source string is graphic data, the value of the integer must be odd (SQLSTATE 428GC). The default start value is 1. See parameter description for CODEUNITS16, CODEUNITS32, or OCTETS for the string unit that applies to the start position.
occurrence
An expression that specifies which occurrence of the pattern expression within the source string to search for. The expression must return a built-in character string, graphic string, or numeric value. If the value is not of type INTEGER, it is implicitly cast to INTEGER before the function is evaluated. This value must be greater than or equal 1. The default occurrence value is 1, which indicates that only the first occurrence of the pattern expression is considered.
return-option
An expression that specifies what is returned relative to the occurrence. The expression must return a built-in character string, graphic string, or numeric value. If the value is not of type INTEGER, it is implicitly cast to INTEGER before the function is evaluated. This value must be 0 or 1 (SQLSTATE 22546):
  • A value of 0 returns the position of the first string unit of the occurrence.
  • A value of 1 returns the position of the string unit that follows the occurrence.
The default return option value is 0.
flags
An expression that specifies flags that controls aspects of the pattern matching. The expression must return a built-in character string that does not specify the FOR BIT DATA attribute (SQLSTATE 42815). The string can include one or more valid flag values and the combination of flag values must be valid (SQLSTATE 2201T). An empty string is the same as the value 'c'. The default flag value is 'c'.
Table 1. Supported flag values
Flag value Description
c Specifies that matching is case-sensitive. This flag is the default value if 'c' or 'i' is not specified. This value must not be specified with a value of 'i'.
i Specifies that matching is case insensitive. This value must not be specified with a value of 'c'.
m Specifies that the input data can contain more than one line. By default, the '^' in a pattern matches only the start of the input string; the '$' in a pattern matches only the end of the input string. If this flag is set, "^" and "$" also matches at the start and end of each line within the input string.
n Specifies that the '.' character in a pattern matches a line terminator in the input string. By default, the '.' character in a pattern does not match a line terminator. A carriage-return and line-feed pair in the input string behaves as a single-line terminator, and matches a single "." in a pattern.
s Specifies that the '.' character in a pattern matches a line terminator in the input string. This value is a synonym for the 'n' value.
x Specifies that white space characters in a pattern are ignored, unless escaped.
group
An expression that specifies which capture group of the pattern expression is used to determine the position within source string to return. The expression must return a built-in character string, graphic string, or numeric value. If the value is not of type INTEGER, it is implicitly cast to INTEGER before the function is evaluated. This value must be greater than or equal to 0 and must not be greater than the number of capture groups in the pattern expression (SQLSTATE 22546). The default group value is 0, which indicates that the position is based on the string that matches the entire pattern.
CODEUNITS16, CODEUNITS32, or OCTETS
Specifies the string unit of both the start value and the result:
  • CODEUNITS16 specifies that the start value and result are expressed in 16-bit UTF-16 code units.
  • CODEUNITS32 specifies that the start value and result are expressed in 32-bit UTF-32 code units. This is the default.
  • OCTETS specifies that the start value and result are expressed in bytes.

If the string unit is specified as CODEUNITS16 or OCTETS, and the string unit of the source string is CODEUNITS32, an error is returned (SQLSTATE 428GC).

For more information, see "String units in built-in functions" in Character strings.

Result

The result of the function is a large integer. If the pattern expression is found, the result is a number from 1 to n, where n is the actual length of the source string plus 1. The result value represents the position expressed in the string units used to process the function. If the pattern expression is not found and no argument is null, the result is 0.

If any argument of the REGEXP_INSTR function can be null, the result can be null. If any argument is null, the result is the null value.

Notes

  • The regular expression processing is done by using the International Components for Unicode (ICU) regular expression interface.
  • Considerations for non-Unicode databases:
    • A regular expression pattern supports only half-width control characters; use a character string data type for the pattern expression argument. A character string data type can be used for the pattern expression argument even when a graphic string data type is used for the source string argument.
    • The source string argument must be a graphic string data type if the pattern expression argument is a graphic string data type.

Examples

  1. Find the first occurrence of a 'o' that has a character that is preceding it.
    SELECT REGEXP_INSTR('hello to you', '.o',1,1)
       FROM sysibm.sysdummy1
    The result is 4, which is the position of the second 'l' character.
  2. Find the second occurrence of a 'o' that has a character that is preceding it.
    SELECT REGEXP_INSTR('hello to you', '.o',1,2) 
       FROM sysibm.sysdummy1
    The result is 7, which is the position of the character 't'.
  3. Find the position after the third occurrence of the first capture group of the regular expression '(.o).' using case insensitive matching.
    SELECT REGEXP_INSTR('hello TO you', '(.o).', 1,3,1,'i',1) 
       FROM sysibm.sysdummy1
    The result is 12, which is the position of the character 'u' at the end of the string.