REGEXP_SUBSTR
The REGEXP_SUBSTR function returns one occurrence of a substring of a string that matches the regular expression pattern.
- source-string
- An expression that specifies the string in which the search is to take place. The expression must return a value that is a built-in character string, graphic string, numeric, or datetime data type. If the value is not a UTF-16 DBCLOB, it is implicitly cast to a UTF-16 DBCLOB before searching for the regular expression pattern. A character string with the FOR BIT DATA attribute or a binary string is not supported. The length of a string must not be greater than 1 gigabyte.
- pattern-expression
- An expression that specifies the regular expression string that
is the pattern for the search. The expression must return a value
that is a built-in character string, graphic string, numeric, or datetime
data type. If the value is not a UTF-16 DBCLOB, it is implicitly cast
to a UTF-16 DBCLOB before searching for the regular expression pattern.
A character string with the FOR BIT DATA attribute or a binary string
is not supported. The length of the string must not be greater than
32K.
A valid pattern-expression consists of a set of characters and control characters that describe the pattern of the search. For a description of the valid control characters, see Regular expression control characters.
- start
- An expression that specifies the position within source-string at which the search is to start. The expression must return a value of any built-in numeric, character-string, or graphic-string data type. The argument is cast to INTEGER before evaluating the function. For more information about converting to INTEGER, see INTEGER or INT. The value of the integer must be greater than or equal to 1. If the value of the integer is greater than the actual length of the source-string, the result is the null value.
- occurrence
- An expression that specifies which occurrence of the pattern-expression within source-string to search for. The expression must return a value of any built-in numeric, character-string, or graphic-string data type. The argument is cast to INTEGER before evaluating the function. For more information about converting to INTEGER, see INTEGER or INT. The value of the integer must be greater than or equal to 1. If occurrence is not specified, the default value is 1 which indicates that only the first occurrence of pattern-expression is considered.
- flags
- An expression that specifies flags that control aspects of the
pattern matching. The expression must return a value that is a built-in
character string or graphic string data type. A character string with
the FOR BIT DATA attribute or a binary string is not supported. The
string can include one or more valid flag values and the combination
of flag values must be valid. An empty string is the same as the value
'c'.
For a description of the valid flag characters, see Regular expression flag values.
- group
- An expression that specifies which capture group of the pattern-expression within source-string to return. The expression must return a value of any built-in numeric, character-string, or graphic-string data type. The argument is cast to INTEGER before evaluating the function. For more information about converting to INTEGER, see INTEGER or INT. The value of the integer must be greater than or equal to 0 and must not be greater than the number of capture groups in the pattern-expression. If group is not specified, the default is 0 which indicates the entire string that matches the entire pattern is returned.
The result of the function is a string. The data type of the result depends on the data type of source-string:
Data type of source-string | Data Type of the Result for REGEXP_SUBSTR |
---|---|
CHAR or VARCHAR or numeric or datetime | VARCHAR |
CLOB | CLOB |
GRAPHIC or VARGRAPHIC | VARGRAPHIC |
DBCLOB | DBCLOB |
The length attribute of the result data type is same as the length attribute of the source-string. The actual length of the result is the length of the occurrence in the string that matches the pattern-expression. If the pattern-expression is not found, the result is the null value.
The CCSID of the result is the same as the source-string.
If any argument of the REGEXP_SUBSTR function can be null, the result can be null. If any argument is null, the result is the null value.
Notes
Prerequisites: In order to use the REGEXP_SUBSTR function, the International Components for Unicode (ICU) option must be installed
Processing: The regular expression processing is done using the International Components for Unicode (ICU) regular expression interface. For more information see, http://userguide.icu-project.org/strings/regexp.
Syntax Alternatives: REGEXP_EXTRACT is a synonym for REGEXP_SUBSTR.
Examples
- Example 1: Return the string which matches any character
preceding a 'o'.
The result is 'lo'.SELECT REGEXP_SUBSTR('hello to you', '.o',1,1) FROM sysibm.sysdummy1
- Example 2: Return the second string occurrence which matches
any character preceding a 'o'.
The result is 'to'.SELECT REGEXP_SUBSTR('hello to you', '.o',1,2) FROM sysibm.sysdummy1
- Example 3: Return the third string occurrence which matches
any character preceding a 'o'.
The result is 'yo'.SELECT REGEXP_SUBSTR('hello to you', '.o',1,3) FROM sysibm.sysdummy1