Start of change

REGEXP_SUBSTR

The REGEXP_SUBSTR function returns one occurrence of a substring of a string that matches the regular expression pattern.

Read syntax diagramSkip visual syntax diagramREGEXP_SUBSTR(source-string ,pattern-expression ,start,occurrence,flags,group )
source-string
An expression that specifies the string in which the search is to take place. The expression must return a value that is a built-in character string, graphic string, numeric, or datetime data type. If the value is not a UTF-16 DBCLOB, it is implicitly cast to a UTF-16 DBCLOB before searching for the regular expression pattern. A character string with the FOR BIT DATA attribute or a binary string is not supported. The length of a string must not be greater than 1 gigabyte.
pattern-expression
An expression that specifies the regular expression string that is the pattern for the search. The expression must return a value that is a built-in character string, graphic string, numeric, or datetime data type. If the value is not a UTF-16 DBCLOB, it is implicitly cast to a UTF-16 DBCLOB before searching for the regular expression pattern. A character string with the FOR BIT DATA attribute or a binary string is not supported. The length of the string must not be greater than 32K.

A valid pattern-expression consists of a set of characters and control characters that describe the pattern of the search. For a description of the valid control characters, see Regular expression control characters.

start
An expression that specifies the position within source-string at which the search is to start. The expression must return a value of any built-in numeric, character-string, or graphic-string data type. The argument is cast to INTEGER before evaluating the function. For more information about converting to INTEGER, see INTEGER or INT. The value of the integer must be greater than or equal to 1. If the value of the integer is greater than the actual length of the source-string, the result is the null value.
occurrence
An expression that specifies which occurrence of the pattern-expression within source-string to search for. The expression must return a value of any built-in numeric, character-string, or graphic-string data type. The argument is cast to INTEGER before evaluating the function. For more information about converting to INTEGER, see INTEGER or INT. The value of the integer must be greater than or equal to 1. If occurrence is not specified, the default value is 1 which indicates that only the first occurrence of pattern-expression is considered.
flags
An expression that specifies flags that control aspects of the pattern matching. The expression must return a value that is a built-in character string or graphic string data type. A character string with the FOR BIT DATA attribute or a binary string is not supported. The string can include one or more valid flag values and the combination of flag values must be valid. An empty string is the same as the value 'c'.

For a description of the valid flag characters, see Regular expression flag values.

group
An expression that specifies which capture group of the pattern-expression within source-string to return. The expression must return a value of any built-in numeric, character-string, or graphic-string data type. The argument is cast to INTEGER before evaluating the function. For more information about converting to INTEGER, see INTEGER or INT. The value of the integer must be greater than or equal to 0 and must not be greater than the number of capture groups in the pattern-expression. If group is not specified, the default is 0 which indicates the entire string that matches the entire pattern is returned.

The result of the function is a string. The data type of the result depends on the data type of source-string:

Data type of source-string Data Type of the Result for REGEXP_SUBSTR
CHAR or VARCHAR or numeric or datetime VARCHAR
CLOB CLOB
GRAPHIC or VARGRAPHIC VARGRAPHIC
DBCLOB DBCLOB

The length attribute of the result data type is same as the length attribute of the source-string. The actual length of the result is the length of the occurrence in the string that matches the pattern-expression. If the pattern-expression is not found, the result is the null value.

The CCSID of the result is the same as the source-string.

If any argument of the REGEXP_SUBSTR function can be null, the result can be null. If any argument is null, the result is the null value.

Notes

Prerequisites: In order to use the REGEXP_SUBSTR function, the International Components for Unicode (ICU) option must be installed

Processing: The regular expression processing is done using the International Components for Unicode (ICU) regular expression interface. For more information see, http://userguide.icu-project.org/strings/regexp.

Syntax Alternatives: REGEXP_EXTRACT is a synonym for REGEXP_SUBSTR.

Examples

  • Example 1: Return the string which matches any character preceding a 'o'.
    SELECT REGEXP_SUBSTR('hello to you', '.o',1,1) 
    FROM sysibm.sysdummy1
    The result is 'lo'.
  • Example 2: Return the second string occurrence which matches any character preceding a 'o'.
    SELECT REGEXP_SUBSTR('hello to you', '.o',1,2) 
    FROM sysibm.sysdummy1
    The result is 'to'.
  • Example 3: Return the third string occurrence which matches any character preceding a 'o'.
    SELECT REGEXP_SUBSTR('hello to you', '.o',1,3) 
    FROM sysibm.sysdummy1
    The result is 'yo'.
End of change