Specifying how DB2 calculates the length of a string

If you use certain length functions, you can specify whether you want DB2® to calculate the length by bytes or characters. This distinction is important for multibyte characters. If you convert DB2 data to Unicode and the data expands, consider updating some of these function calls to specify the appropriate unit of measurement.

For example, consider the string Jürgen in UTF-8. This string consists of 6 characters. However, it takes 7 bytes of storage, because the character ü takes 2 bytes in UTF-8. You can specify whether you want DB2 to count the length as 6 or 7.

The key is to specify the size code unit that you want DB2 to use when calculating the length. A code unit is the minimal bit combination that can represent a character.

Procedure

To specify how DB2 calculates the length of a string:

If you are using any of the following length functions, specify the appropriate unit of measurement:

Begin general-use programming interface information.
Applicable functions:

  • CHARACTER_LENGTH
  • CLOB
  • DBCLOB
  • GRAPHIC
  • LEFT
  • LOCATE
  • LOCATE_IN_STRING
  • OVERLAY
  • POSITION
  • RIGHT
  • SUBSTRING
  • VARCHAR
  • VARGRAPHIC
End general-use programming interface information.

Options to specify unit of measurement:

CODEUNITS16
Specifies that DB2 is to count the length by 16-bit (or 2-byte) code units. For every character that is 2 bytes or less in the string, DB2 counts a length of 1.
CODEUNITS32
Specifies that DB2 is to count the length by 32-bit (or 4-byte) code units. For every character that is 4 bytes or less in the string, DB2 counts a length of 1. CODEUNITS32 always returns the same value as CODEUNITS16 unless you have supplementary characters.
OCTETS
Specifies that DB2 is to count the length by bytes. For every byte in the string, DB2 counts a length of 1.

The OCTETS option is not available for all of the listed functions.

Example

Begin general-use programming interface information.
Example of CHARACTER_LENGTH: Assume that NAME is a VARCHAR(128) column that is encoded in Unicode UTF-8 and contains 'Jürgen'. The character ü requires two bytes in UTF-8.

The following two queries both return the value 6, because DB2 counts the string Jürgen as 6 characters. In the first query, CODEUNITS32 means that any character that is 4 bytes or less is counted as 1. In the second query, CODEUNITS16 means that any character that is 2 bytes or less is counted as 1. In both cases, the result is the same.

SELECT CHARACTER_LENGTH(NAME,CODEUNITS32)
		FROM T1 WHERE NAME = 'Jürgen';
SELECT CHARACTER_LENGTH(NAME,CODEUNITS16)
		FROM T1 WHERE NAME = 'Jürgen';
However the following two queries return the value 7, because the string contains 7 bytes. In the first query, OCTETS means that length is to be calculated in bytes. In the second query, the LENGTH function always counts by bytes.
SELECT CHARACTER_LENGTH(NAME,OCTETS)
		FROM T1 WHERE NAME = 'Jürgen';
SELECT LENGTH(NAME)
		FROM T1 WHERE NAME = 'Jürgen';
Example of LOCATE_IN_STRING: The LOCATE_IN_STRING function returns the position at which an occurrence of an argument starts within a specified string. This function is similar to POSITION, but adds a parameter to specify which instance of the search string to find. The following statement sets the value of the host variable POSITION to 26, because the character ß is the 26th character in the string. In this case, CODEUNITS32 means that any character that is 4 bytes or less is counted as 1.
SET :POSITION = LOCATE_IN_STRING('Jürgen lives on Hegelstraße','ß',-1,CODEUNITS32);
-- search from end
The following statement sets the value of the host variable POSITION to 6. DB2 starts at position 1 and looks for the third occurrence of the character N. In this case, OCTETS means that DB2 counts the length by bytes.
SET :POSITION = LOCATE_IN_STRING('WINNING','N',1,3,OCTETS);
Examples of other length functions: The following table shows examples of the CODEUNITS16, CODEUNITS32, and OCTET options.
Table 1. Examples of length functions
Function Result Hexadecimal result value
LEFT(‘Jürgen',2,CODEUNITS32)
'Jü'
X'4AC3BC'
LEFT(‘Jürgen',2,CODEUNITS16)
'Jü'
X'4AC3BC'
LEFT(‘Jürgen',2,OCTETS) 
'J '  
X'4A20' (a truncated string)
LEFT(‘Jürgen',2) 
'J?'
X'4AC3' (The letter ‘J' and a partial character)1
RIGHT(‘Jürgen',5,CODEUNITS32)
'ürgen'
X'C3BC7267656E'
RIGHT(‘Jürgen',5,CODEUNITS16) 
'ürgen'
X'C3BC7267656E'
RIGHT(‘Jürgen',5,OCTETS) 
'rgen'
X'207267656E' (a truncated string)
RIGHT(‘Jürgen',5) 
'?rgen'
X'BC7267656E' (a partial character followed by ‘rgen')1
SUBSTRING(‘Jürgen',1,2,CODEUNITS32) 
'Jü'
X'4AC3BC'
SUBSTRING(‘Jürgen',1,2,CODEUNITS16)
'Jü'
X'4AC3BC'
SUBSTRING(‘Jürgen',1,2,OCTETS)
'J ' 
X'4A20' (a truncated string)
SUBSTR(‘Jürgen',1,2) 
‘J?'
X'4AC3' (a partial character)
SUBSTRING(‘Jürgen',8,CODEUNITS16) 
''
a zero-length string
SUBSTRING(‘Jürgen',8,4,OCTETS) 
''
a zero-length string
  1. If conversion occurs on a string with a partial character, SQLCODE -330 results.
 
End general-use programming interface information.