Specifying how Db2 calculates the length of a string
If you use certain length functions, you can specify whether you want Db2 to calculate the length by bytes or characters. This distinction is important for multibyte characters. If you convert Db2 data to Unicode and the data expands, consider updating some of these function calls to specify the appropriate unit of measurement.
For example, consider the string Jürgen in UTF-8. This string consists of 6 characters. However, it takes 7 bytes of storage, because the character ü takes 2 bytes in UTF-8. You can specify whether you want Db2 to count the length as 6 or 7.
The key is to specify the size code unit that you want Db2 to use when calculating the length. A code unit is the minimal bit combination that can represent a character.
Procedure
To specify how Db2 calculates the length of a string:
Applicable functions:
- CHARACTER_LENGTH
- CLOB
- DBCLOB
- GRAPHIC
- LEFT
- LOCATE
- LOCATE_IN_STRING
- OVERLAY
- POSITION
- RIGHT
- SUBSTRING
- VARCHAR
- VARGRAPHIC
Options to specify unit of measurement:
- CODEUNITS16
- Specifies that Db2 is to count the length by 16-bit (or 2-byte) code units. For every character that is 2 bytes or less in the string, Db2 counts a length of 1.
- CODEUNITS32
- Specifies that Db2 is to count the length by 32-bit (or 4-byte) code units. For every character that is 4 bytes or less in the string, Db2 counts a length of 1. CODEUNITS32 always returns the same value as CODEUNITS16 unless you have supplementary characters.
- OCTETS
- Specifies that Db2 is to
count the length by bytes. For every byte in the string, Db2 counts a length of 1.
The OCTETS option is not available for all of the listed functions.
Examples
- Example: CHARACTER_LENGTH
- Assume that NAME is a VARCHAR(128) column that is encoded in Unicode UTF-8 and contains 'Jürgen'. The character ü requires two bytes in UTF-8.
The following two queries both return the value 6, because Db2 counts the string Jürgen as 6 characters. In the first query, CODEUNITS32 means that any character that is 4 bytes or less is counted as 1. In the second query, CODEUNITS16 means that any character that is 2 bytes or less is counted as 1. In both cases, the result is the same.
SELECT CHARACTER_LENGTH(NAME,CODEUNITS32) FROM T1 WHERE NAME = 'Jürgen';
However the following two queries return the value 7, because the string contains 7 bytes. In the first query, OCTETS means that length is to be calculated in bytes. In the second query, the LENGTH function always counts by bytes.SELECT CHARACTER_LENGTH(NAME,CODEUNITS16) FROM T1 WHERE NAME = 'Jürgen';
SELECT CHARACTER_LENGTH(NAME,OCTETS) FROM T1 WHERE NAME = 'Jürgen';
SELECT LENGTH(NAME) FROM T1 WHERE NAME = 'Jürgen';
- Example: LOCATE_IN_STRING
- The LOCATE_IN_STRING function returns the position at which an occurrence of an argument starts within a specified string. This function is similar to POSITION, but adds a parameter to specify which instance of the search string to find. The following statement sets the value of the host variable POSITION to 26, because the character ß is the 26th character in the string. In this case, CODEUNITS32 means that any character that is 4 bytes or less is counted as 1.
The following statement sets the value of the host variable POSITION to 6. Db2 starts at position 1 and looks for the third occurrence of the character N. In this case, OCTETS means that Db2 counts the length by bytes.SET :POSITION = LOCATE_IN_STRING('Jürgen lives on Hegelstraße','ß',-1,CODEUNITS32); -- search from end
SET :POSITION = LOCATE_IN_STRING('WINNING','N',1,3,OCTETS);
- Examples of other length functions
- The following table shows examples of the CODEUNITS16, CODEUNITS32, and OCTET options.
Table 1. Examples of length functions Function Result Hexadecimal result value LEFT(‘Jürgen',2,CODEUNITS32)
'Jü'
X'4AC3BC' LEFT(‘Jürgen',2,CODEUNITS16)
'Jü'
X'4AC3BC' LEFT(‘Jürgen',2,OCTETS)
'J '
X'4A20' (a truncated string) LEFT(‘Jürgen',2)
'J?'
X'4AC3' (The letter ‘J' and a partial character)1 RIGHT(‘Jürgen',5,CODEUNITS32)
'ürgen'
X'C3BC7267656E' RIGHT(‘Jürgen',5,CODEUNITS16)
'ürgen'
X'C3BC7267656E' RIGHT(‘Jürgen',5,OCTETS)
'rgen'
X'207267656E' (a truncated string) RIGHT(‘Jürgen',5)
'?rgen'
X'BC7267656E' (a partial character followed by ‘rgen')1 SUBSTRING(‘Jürgen',1,2,CODEUNITS32)
'Jü'
X'4AC3BC' SUBSTRING(‘Jürgen',1,2,CODEUNITS16)
'Jü'
X'4AC3BC' SUBSTRING(‘Jürgen',1,2,OCTETS)
'J '
X'4A20' (a truncated string) SUBSTR(‘Jürgen',1,2)
‘J?'
X'4AC3' (a partial character) SUBSTRING(‘Jürgen',8,CODEUNITS16)
''
a zero-length string SUBSTRING(‘Jürgen',8,4,OCTETS)
''
a zero-length string - If conversion occurs on a string with a partial character, SQLCODE -330 results.