Difference between CODEUNITS16 and CODEUNITS32
CODEUNITS16 and CODEUNITS32 return the same answer unless the data contains supplementary characters.
A supplementary character is represented as two UTF-16 code units or one UTF-32 code unit. In UTF-8, a non-supplementary character is represented by 1 to 3 bytes and a supplementary character is represented by 4 bytes. In UTF-16, a non-supplementary character is represented by one CODEUNIT16 code unit or 2 bytes, and a supplementary character is represented by two CODEUNIT16 code units or 4 bytes. In UTF-32, a character is represented by one CODEUNIT32 code unit or 4 bytes. Thus, CODEUNITS16 and CODEUNITS32 return different answers when the data contains supplementary characters.
Example 1: The following table shows the hexadecimal values
for the mathematical bold capital A and the Latin capital letter A.
The mathematical bold capital A is a supplementary character that
is represented by 4 bytes in UTF-8, UTF-16, and UTF-32.
Character | UTF-8 representation | UTF-16 representation | UTF-32 representation |
---|---|---|---|
Unicode value \u1D400 - 'A' MATHEMATICAL BOLD CAPITAL A |
X'F09D9080' | X'D835DC00' | X'0001D400' |
Unicode value \u0041 - 'A' LATIN CAPITAL LETTER A |
X'41' | X'0041' | X'00000041' |
Assume that C1 is a VARCHAR(128) column, encoded in Unicode UTF-8,
and that table T1 contains one row with the value of the mathematical
bold capital A (X'F09D9080'). The following similar queries
return different answers:
-- Query: -- Returns the value:
SELECT CHARACTER_LENGTH(C1,CODEUNITS32) FROM T1; -- 1
SELECT CHARACTER_LENGTH(C1,CODEUNITS16) FROM T1; -- 2
SELECT CHARACTER_LENGTH(C1,OCTETS) FROM T1; -- 4
Example 2: Assume that C1 is a VARCHAR(128) column, encoded
in Unicode UTF-8, and that table T1 contains one row with the value
of the mathematical bold capital A (X'F09D9080'). The following
similar queries return different answers.
-- Query: -- Returns the value:
SELECT HEX(SUBSTRING(C1,1,1,CODEUNITS32)) FROM T1; -- X'F09D9080'
SELECT HEX(SUBSTRING(C1,1,1,CODEUNITS16)) FROM T1; -- X'20'
SELECT HEX(SUBSTRING(C1,1,2,CODEUNITS16)) FROM T1; -- X'F09D9080'
SELECT HEX(SUBSTRING(C1,1,1,OCTETS)) FROM T1; -- X'20'
SELECT HEX(SUBSTR(C1,1,1)) FROM T1; -- X'F0'
The
value X'20' is the pad (blank) character.