SUBSTRING scalar function

The SUBSTRING function returns a substring of a string.

The schema is SYSIBM.

expressionAn expression that specifies the string from which the result is derived. The expression must return a built-in character string, numeric value, Boolean value, or datetime value. If the value is not a character string, it is implicitly cast to VARCHAR before the function is evaluated.

A substring of the input expression comprises zero or more contiguous string units of the input expression.

startAn expression that specifies the position, relative to the beginning of the input expression, from which the substring is to be calculated. For example:

Position 1 is the first string unit of the input expression. The statement SUBSTRING('abc',1,2) returns 'ab'.
Position 2 is one position to the right of position 1. The statement SUBSTRING('abc',2,2) returns 'bc'.
Position 0 is one position to the left of position 1. The statement SUBSTRING('abc',0,2) returns 'a'.
Position -1 is two positions to the left of position 1. The statement SUBSTRING('abc',-1,2) returns a zero-length string.

The expression must return a built-in numeric, CHAR, VARCHAR, GRAPHIC, or VARGRAPHIC value. If the value is not of type INTEGER, it is implicitly cast to INTEGER before evaluating the function.

The start value can be positive, negative, or zero. If OCTETS is specified and the input expression contains graphic data, the start value must be odd (SQLSTATE 428GC).

lengthAn expression that specifies the length of the result. The expression must return a built-in numeric, CHAR, VARCHAR, GRAPHIC, or VARGRAPHIC value. If the value is not of type INTEGER, it is implicitly cast to INTEGER before evaluating the function.

If the input expression is:

A fixed-length string, the default length is CHARACTER_LENGTH(expression USING string-unit) - start + 1. This is the number of string units (CODEUNITS16, CODEUNITS32, or OCTETS) from the start position to the final position of the input expression.
A varying-length string, the default length is zero or CHARACTER_LENGTH(expression USING string-unit) - start + 1, whichever is greater.

If the specified length is zero, the result is the empty string.

If the value is not of type INTEGER, it is implicitly cast to INTEGER before evaluating the function. The value must be greater than or equal to zero. If a value greater than n is specified, where n is the (length attribute of expression) - start + 1, then n is used as the length of the resulting substring. The value is expressed in the string units that are explicitly specified. If OCTETS is specified, and if the input expression contains graphic data, the length must be an even number (SQLSTATE 428GC).

CODEUNITS16, CODEUNITS32, or OCTETS

Specifies the string units of start and length. CODEUNITS16 specifies that start and length are to be expressed in 16-bit UTF-16 code units. CODEUNITS32 specifies that start and length are to be expressed in 32-bit UTF-32 code units. OCTETS specifies that start and length are to be expressed in bytes.

If string units are specified as CODEUNITS16 or CODEUNITS32, and expression is a binary string or a FOR BIT DATA string, an error is returned (SQLSTATE 428GC).

If a string units argument is not specified and expression is a character string that is not FOR BIT DATA or is a graphic string, the default is CODEUNITS32. Otherwise, the default is OCTETS.

For more information about CODEUNITS16, CODEUNITS32, and OCTETS, see String units in built-in functions in Character strings.

When the SUBSTRING function is invoked using OCTETS, and the source-string is encoded in a code page that requires more than one byte per code point (mixed or MBCS), the SUBSTRING operation might split a multi-byte code point and the resulting substring might begin or end with a partial code point. If this occurs, the function replaces the bytes of leading or trailing partial code points with blanks in a way that does not change the byte length of the result. (See a related example in the Examples section.)

Result

The data type of the result depends on the data type of the first argument, as shown in the following table.

Table 1. Data type of the result of SUBSTRING
Data type of the first argument	Data type of the result
CHAR or VARCHAR	VARCHAR
CLOB	CLOB
GRAPHIC or VARGRAPHIC	VARGRAPHIC
DBCLOB	DBCLOB
BINARY or VARBINARY	VARBINARY
BLOB	BLOB

The length attribute of the result is equal to the length attribute of the input expression. If any argument of the function can be null, the result can be null; if any argument is null, the result is the null value. The result is not padded with any character. If the input expression has an actual length of 0, the result also has an actual length of 0.

Notes

The length attribute of the result is equal to the length attribute of the input expression. This behavior is different from the behavior of the SUBSTR function, where the length attribute is derived from the start and the length arguments of the function.
The FOR keyword is not supported in embedded COBOL applications because the FOR keyword has historically been used by the COBOL precompiler. Avoid using this optional keyword in COBOL applications.

Examples

Example 1: FIRSTNAME is a VARCHAR(12) column in table T1. One of its values is the 6-character string 'Jürgen'. When FIRSTNAME has this value:

Function ...                          Returns ...
------------------------------------  ------------------------------------
SUBSTRING(FIRSTNAME,1,2,CODEUNITS32)  'Jü' -- x'4AC3BC'
SUBSTRING(FIRSTNAME,1,2,CODEUNITS16)  'Jü' -- x'4AC3BC'
SUBSTRING(FIRSTNAME,1,2,OCTETS)       'J ' -- x'4A20' (a truncated string)
SUBSTRING(FIRSTNAME,8,CODEUNITS16)    a zero-length string
SUBSTRING(FIRSTNAME,8,4,OCTETS)       a zero-length string
SUBSTRING(FIRSTNAME,0,2,CODEUNITS32)  'J' -- x'4AC3BC'

Example 2: The following example illustrates how SUBSTRING replaces the bytes of leading or trailing partial multi-byte code points with blanks when the string length unit is OCTETS. Assume that UTF8_VAR contains the UTF-8 representation of the Unicode string '&N~AB', where '&' is the musical symbol G clef and '~' is the combining tilde character.
```
   SUBSTRING(UTF8_VAR, 2, 5, OCTETS)
```
Three blank bytes precede the 'N', and one blank byte follows the 'N'.