Specifying how Db2 calculates the length of a string

If you use certain length functions, you can specify whether you want Db2 to calculate the length by bytes or characters. This distinction is important for multibyte characters. If you convert Db2 data to Unicode and the data expands, consider updating some of these function calls to specify the appropriate unit of measurement.

For example, consider the string Jürgen in UTF-8. This string consists of 6 characters. However, it takes 7 bytes of storage, because the character ü takes 2 bytes in UTF-8. You can specify whether you want Db2 to count the length as 6 or 7.

The key is to specify the size code unit that you want Db2 to use when calculating the length. A code unit is the minimal bit combination that can represent a character.

Procedure

To specify how Db2 calculates the length of a string:

If you are using any of the following length functions, specify the appropriate unit of measurement:

Begin general-use programming interface information. Applicable functions:

CHARACTER_LENGTH
CLOB
DBCLOB
GRAPHIC
LEFT
LOCATE
LOCATE_IN_STRING
OVERLAY
POSITION
RIGHT
SUBSTRING
VARCHAR
VARGRAPHIC

End general-use programming interface information.

Options to specify unit of measurement:

CODEUNITS16: Specifies that Db2 is to count the length by 16-bit (or 2-byte) code units. For every character that is 2 bytes or less in the string, Db2 counts a length of 1.
CODEUNITS32: Specifies that Db2 is to count the length by 32-bit (or 4-byte) code units. For every character that is 4 bytes or less in the string, Db2 counts a length of 1. CODEUNITS32 always returns the same value as CODEUNITS16 unless you have supplementary characters.
OCTETS: Specifies that Db2 is to count the length by bytes. For every byte in the string, Db2 counts a length of 1.
The OCTETS option is not available for all of the listed functions.

Examples

Begin general-use programming interface information.

Example: CHARACTER_LENGTH

Assume that NAME is a VARCHAR(128) column that is encoded in Unicode UTF-8 and contains 'Jürgen'. The character ü requires two bytes in UTF-8.

The following two queries both return the value 6, because Db2 counts the string Jürgen as 6 characters. In the first query, CODEUNITS32 means that any character that is 4 bytes or less is counted as 1. In the second query, CODEUNITS16 means that any character that is 2 bytes or less is counted as 1. In both cases, the result is the same.

SELECT CHARACTER_LENGTH(NAME,CODEUNITS32)
		FROM T1 WHERE NAME = 'Jürgen';

SELECT CHARACTER_LENGTH(NAME,CODEUNITS16)
		FROM T1 WHERE NAME = 'Jürgen';

However the following two queries return the value 7, because the string contains 7 bytes. In the first query, OCTETS means that length is to be calculated in bytes. In the second query, the LENGTH function always counts by bytes.

SELECT CHARACTER_LENGTH(NAME,OCTETS)
		FROM T1 WHERE NAME = 'Jürgen';

SELECT LENGTH(NAME)
		FROM T1 WHERE NAME = 'Jürgen';

Example: LOCATE_IN_STRING

The LOCATE_IN_STRING function returns the position at which an occurrence of an argument starts within a specified string. This function is similar to POSITION, but adds a parameter to specify which instance of the search string to find. The following statement sets the value of the host variable POSITION to 26, because the character ß is the 26th character in the string. In this case, CODEUNITS32 means that any character that is 4 bytes or less is counted as 1.

SET :POSITION = LOCATE_IN_STRING('Jürgen lives on Hegelstraße','ß',-1,CODEUNITS32);
-- search from end

The following statement sets the value of the host variable POSITION to 6. Db2 starts at position 1 and looks for the third occurrence of the character N. In this case, OCTETS means that Db2 counts the length by bytes.

SET :POSITION = LOCATE_IN_STRING('WINNING','N',1,3,OCTETS);

Examples of other length functions

The following table shows examples of the CODEUNITS16, CODEUNITS32, and OCTET options.

Table 1. Examples of length functions
Function	Result	Hexadecimal result value
`LEFT(‘Jürgen',2,CODEUNITS32)`	`'Jü'`	X'4AC3BC'
`LEFT(‘Jürgen',2,CODEUNITS16)`	`'Jü'`	X'4AC3BC'
`LEFT(‘Jürgen',2,OCTETS)`	`'J '`	X'4A20' (a truncated string)
`LEFT(‘Jürgen',2)`	`'J?'`	X'4AC3' (The letter ‘J' and a partial character)¹
`RIGHT(‘Jürgen',5,CODEUNITS32)`	`'ürgen'`	X'C3BC7267656E'
`RIGHT(‘Jürgen',5,CODEUNITS16)`	`'ürgen'`	X'C3BC7267656E'
`RIGHT(‘Jürgen',5,OCTETS)`	`'rgen'`	X'207267656E' (a truncated string)
`RIGHT(‘Jürgen',5)`	`'?rgen'`	X'BC7267656E' (a partial character followed by ‘rgen')¹
`SUBSTRING(‘Jürgen',1,2,CODEUNITS32)`	`'Jü'`	X'4AC3BC'
`SUBSTRING(‘Jürgen',1,2,CODEUNITS16)`	`'Jü'`	X'4AC3BC'
`SUBSTRING(‘Jürgen',1,2,OCTETS)`	`'J '`	X'4A20' (a truncated string)
`SUBSTR(‘Jürgen',1,2)`	`‘J?'`	X'4AC3' (a partial character)
`SUBSTRING(‘Jürgen',8,CODEUNITS16)`	`''`	a zero-length string
`SUBSTRING(‘Jürgen',8,4,OCTETS)`	`''`	a zero-length string
If conversion occurs on a string with a partial character, SQLCODE -330 results.