Start of change

REGEXP_REPLACE

The REGEXP_REPLACE scalar function returns a modified version of the source string where occurrences of the regular expression pattern found in the source string are replaced with the specified replacement string.

FL 504

Start of change
Passthrough-only expression: This function is passthrough-only and cannot run on Db2 for z/OS® without acceleration. For information about invoking this function, see Accelerating queries with passthrough-only expressions.
End of change
Read syntax diagramSkip visual syntax diagramREGEXP_REPLACE(source-string ,pattern-expression,replacement-string,start,occurrence,flags ,CODEUNITS32,CODEUNITS16OCTETS )

The schema is SYSIBM.

source-string
An expression that specifies the string in which the search is to take place. This expression must return a built-in character string, graphic string, numeric value, Boolean value, or datetime value. A numeric, Boolean, or datetime value is implicitly cast to VARCHAR before the function is evaluated. A character string cannot specify the FOR BIT DATA attribute.
pattern-expression
An expression that specifies the regular expression string that is the pattern for the search. This expression must return a built-in character string, graphic string, numeric value, Boolean value, or datetime value. A numeric, Boolean, or datetime value is implicitly cast to VARCHAR before the function is evaluated. A character string cannot specify the FOR BIT DATA attribute.
replacement-string
An expression that specifies the replacement string for matching substrings. The expression must return a value that is a built-in character string, graphic string, numeric value, Boolean value, or datetime value. A numeric, Boolean, or datetime value is implicitly cast to VARCHAR before the function is evaluated. A character string cannot specify the FOR BIT DATA attribute. The default replacement string is the empty string.

The content of the replacement string can include references to capture group text from the search to use in the replacement text. These references are of the form '$n' or '\n', where n is the number of the capture group and 0 represents the entire string that matches the pattern. The value for n must be in the range 0-9 and not greater than the number of capture groups in the pattern. For example, either '$2' or '\2' can be used to refer to the content found in the source string for the second capture group that is specified in the pattern expression. If the pattern expression must include a literal reference to a '$' or '\' character, that character must be preceded with an '/' character as an escape character ('\$' or '\\').

start
An expression that specifies the position within source-string at which the search is to start. The expression must return a built-in character string, graphic string, Boolean, or numeric value. If the value is not of type INTEGER, it is implicitly cast to INTEGER before the function is evaluated. The value of the integer must be greater than or equal to 1. If OCTETS is specified and the source string is graphic data, the value of the integer must be odd. The default start value is 1. See parameter description for CODEUNITS16, CODEUNITS32, or OCTETS for the string unit that applies to the start position.
occurrence
An expression that specifies which occurrence of the pattern expression within the source string is to be searched for and replaced. The expression must return a built-in character string, graphic string, Boolean, or numeric value. If the value is not of type INTEGER, it is implicitly cast to INTEGER before the function is evaluated. The occurrence value must be greater than or equal to 0. The default occurrence value is 0, which indicates that all occurrences of the pattern expression in the source string are replaced.
flags
An expression that specifies flags that controls aspects of the pattern matching. The expression must return a built-in character string that does not specify the FOR BIT DATA attribute. The string can include one or more valid flag values and the combination of flag values must be valid. An empty string is the same as the value 'c'. The default flag value is 'c'.
Table 1. Supported flag values
Flag value Description
c Specifies that matching is case-sensitive. This flag is the default value if 'c' or 'i' is not specified. This value must not be specified with a value of 'i'.
i Specifies that matching is case insensitive. This value must not be specified with a value of 'c'.
m Specifies that the input data can contain more than one line. By default, the '^' in a pattern matches only the start of the input string; the '$' in a pattern matches only the end of the input string. If this flag is set, "^" and "$" also matches at the start and end of each line within the input string.
n Specifies that the '.' character in a pattern matches a line terminator in the input string. By default, the '.' character in a pattern does not match a line terminator. A carriage-return and line-feed pair in the input string behaves as a single-line terminator and matches a single "." in a pattern.
s Specifies that the '.' character in a pattern matches a line terminator in the input string. This value is a synonym for the 'n' value.
x Specifies that white space characters in a pattern are ignored, unless escaped.
CODEUNITS16, CODEUNITS32, or OCTETS
Specifies the string unit of the start value:
  • CODEUNITS16 specifies that the start value is expressed in 16-bit UTF-16 code units.
  • CODEUNITS32 specifies that the start value is expressed in 32-bit UTF-32 code units. This is the default.
  • OCTETS specifies that the start value is expressed in bytes.

If the string unit is specified as CODEUNITS16 or OCTETS, and if the string unit of the source string is CODEUNITS32, an error is returned.

For more information, see String unit specifications.

The result of the function is a string. If there are no occurrences of the pattern to be replaced and no argument is null, the original string is returned. The data type of the string is the same data type as the source string, except for CHAR, which becomes VARCHAR, and VARGRAPHIC, which becomes GRAPHIC.

The length attribute of the result data type is determined based on the length attributes of the source string and the replacement string by using the following calculation:
   MIN(MaxTypeLen, LAS+(LAS+1)*LAR)
where MaxTypeLen represents the maximum length attribute for the data type of the result, LAS represents the length attribute for the data type of the source string, and LAR represents the length attribute for the data type of the replacement string. If the replacement string is not specified, the value for LAR is 0. If the actual length of the result string exceeds the maximum for the return data type, an error is returned.

If the replacement string is specified, the result buffer length is 32672 bytes. If the replacement string is not specified, the result buffer length is the same as the length of the source string except for the following data types:

Table 2. Data types with different result buffer lengths
Data Type Length (in bytes)
numerical 42
date 10
time 8
timestamp 32

If any argument of the REGEXP_REPLACE function can be null, the result can be null. If any argument is null, the result is the null value.

Notes

  • The regular expression processing is done by using the International Components for Unicode (ICU) regular expression interface.
  • Considerations for non-Unicode databases:
    • A regular expression pattern supports only halfwidth control characters; use a character string data type for the pattern expression argument. A character string data type can be used for the pattern expression argument even when a graphic string data type is used for the source string argument.
    • The source string argument must be a graphic string data type if the pattern expression argument is a graphic string data type.
    • The source string and replacement string arguments must both be character string data types or both be graphic string data types.

Example

Replace the second occurrence of the pattern 'R.d' with 'Orange' using a case sensitive search.
SELECT REGEXP_REPLACE(
   'Red Yellow RED Blue Red Green Blue', 'R.d','Orange',1,2,'c') 
      FROM sysibm.sysdummy1
The result is 'Red Yellow RED Blue Orange Green Blue'.
End of change