SUBSTRING scalar function
The SUBSTRING function returns a substring of a string.
The schema is SYSIBM.
-
expression
- An
expression that specifies the string from which the result is derived.
The expression must return a built-in character string, numeric value, Boolean value, or datetime
value. If the value is not a character string, it is implicitly cast to VARCHAR before the function
is evaluated.
A substring of the input expression comprises zero or more contiguous string units of the input expression.
start
- An expression that specifies the position, relative to the beginning of the input expression,
from which the substring is to be calculated. For example:
- Position 1 is the first string unit of the input expression. The statement
SUBSTRING('abc',1,2)
returns'ab'
. - Position 2 is one position to the right of position 1. The statement
SUBSTRING('abc',2,2)
returns'bc'
. - Position 0 is one position to the left of position 1. The statement
SUBSTRING('abc',0,2)
returns'a'
. - Position -1 is two positions to the left of position 1. The statement
SUBSTRING('abc',-1,2)
returns a zero-length string.
The start value can be positive, negative, or zero. If OCTETS is specified and the input expression contains graphic data, the start value must be odd (SQLSTATE 428GC).
length
- Position 1 is the first string unit of the input expression. The statement
- An expression that specifies the length of the result. The expression must
return a built-in numeric, CHAR, VARCHAR, GRAPHIC, or VARGRAPHIC value. If the value is not of type
INTEGER, it is implicitly cast to INTEGER before evaluating the function.If the input expression is:
- A fixed-length string, the default length is
CHARACTER_LENGTH(expression USING string-unit) - start + 1
. This is the number of string units (CODEUNITS16, CODEUNITS32, or OCTETS) from the start position to the final position of the input expression. - A varying-length string, the default length is zero or
CHARACTER_LENGTH(expression USING string-unit) - start + 1
, whichever is greater.
If the value is not of type INTEGER, it is implicitly cast to INTEGER before evaluating the function. The value must be greater than or equal to zero. If a value greater than n is specified, where n is the (length attribute of expression) - start + 1, then n is used as the length of the resulting substring. The value is expressed in the string units that are explicitly specified. If OCTETS is specified, and if the input expression contains graphic data, the length must be an even number (SQLSTATE 428GC).
- A fixed-length string, the default length is
- CODEUNITS16, CODEUNITS32, or OCTETS
- Specifies the string units of start and
length. CODEUNITS16 specifies that start and
length are to be expressed in 16-bit UTF-16 code units. CODEUNITS32
specifies that start and length are to be
expressed in 32-bit UTF-32 code units. OCTETS specifies that start and
length are to be expressed in bytes.
If string units are specified as CODEUNITS16 or CODEUNITS32, and expression is a binary string or a FOR BIT DATA string, an error is returned (SQLSTATE 428GC).
If a string units argument is not specified and expression is a character string that is not FOR BIT DATA or is a graphic string, the default is CODEUNITS32. Otherwise, the default is OCTETS.
For more information about CODEUNITS16, CODEUNITS32, and OCTETS, see
String units in built-in functions
inCharacter strings
.When the SUBSTRING function is invoked using OCTETS, and the source-string is encoded in a code page that requires more than one byte per code point (mixed or MBCS), the SUBSTRING operation might split a multi-byte code point and the resulting substring might begin or end with a partial code point. If this occurs, the function replaces the bytes of leading or trailing partial code points with blanks in a way that does not change the byte length of the result. (See a related example in the Examples section.)
Result
The data type of the result depends on the data type of the first argument, as shown in the following table.
Data type of the first argument | Data type of the result |
---|---|
CHAR or VARCHAR | VARCHAR |
CLOB | CLOB |
GRAPHIC or VARGRAPHIC | VARGRAPHIC |
DBCLOB | DBCLOB |
BINARY or VARBINARY | VARBINARY |
BLOB | BLOB |
The length attribute of the result is equal to the length attribute of the input expression. If any argument of the function can be null, the result can be null; if any argument is null, the result is the null value. The result is not padded with any character. If the input expression has an actual length of 0, the result also has an actual length of 0.
Notes
- The length attribute of the result is equal to the length attribute of the input expression. This behavior is different from the behavior of the SUBSTR function, where the length attribute is derived from the start and the length arguments of the function.
- The FOR keyword is not supported in embedded COBOL applications because the FOR keyword has historically been used by the COBOL precompiler. Avoid using this optional keyword in COBOL applications.
Examples
- Example 1: FIRSTNAME is a VARCHAR(12) column in table T1. One of its values is the
6-character string
'Jürgen'
. When FIRSTNAME has this value:Function ... Returns ... ------------------------------------ ------------------------------------ SUBSTRING(FIRSTNAME,1,2,CODEUNITS32) 'Jü' -- x'4AC3BC' SUBSTRING(FIRSTNAME,1,2,CODEUNITS16) 'Jü' -- x'4AC3BC' SUBSTRING(FIRSTNAME,1,2,OCTETS) 'J ' -- x'4A20' (a truncated string) SUBSTRING(FIRSTNAME,8,CODEUNITS16) a zero-length string SUBSTRING(FIRSTNAME,8,4,OCTETS) a zero-length string SUBSTRING(FIRSTNAME,0,2,CODEUNITS32) 'J' -- x'4AC3BC'
- Example 2: The following example illustrates how SUBSTRING
replaces the bytes of leading or trailing partial multi-byte code
points with blanks when the string length unit is OCTETS. Assume that
UTF8_VAR contains the UTF-8 representation of the Unicode string '&N~AB',
where '&' is the musical symbol G clef and '~' is the combining
tilde character.
Three blank bytes precede the 'N', and one blank byte follows the 'N'.SUBSTRING(UTF8_VAR, 2, 5, OCTETS)