Difference between CODEUNITS16 and CODEUNITS32

CODEUNITS16 and CODEUNITS32 return the same answer unless the data contains supplementary characters.

A supplementary character is represented as two UTF-16 code units or one UTF-32 code unit. In UTF-8, a non-supplementary character is represented by 1 to 3 bytes and a supplementary character is represented by 4 bytes. In UTF-16, a non-supplementary character is represented by one CODEUNIT16 code unit or 2 bytes, and a supplementary character is represented by two CODEUNIT16 code units or 4 bytes. In UTF-32, a character is represented by one CODEUNIT32 code unit or 4 bytes. Thus, CODEUNITS16 and CODEUNITS32 return different answers when the data contains supplementary characters.

Example 1: The following table shows the hexadecimal values for the mathematical bold capital A and the Latin capital letter A. The mathematical bold capital A is a supplementary character that is represented by 4 bytes in UTF-8, UTF-16, and UTF-32.
Character UTF-8 representation UTF-16 representation UTF-32 representation
Unicode value \u1D400 - 'A'

MATHEMATICAL BOLD CAPITAL A

X'F09D9080' X'D835DC00' X'0001D400'
Unicode value \u0041 - 'A'

LATIN CAPITAL LETTER A

X'41' X'0041' X'00000041'
Assume that C1 is a VARCHAR(128) column, encoded in Unicode UTF-8, and that table T1 contains one row with the value of the mathematical bold capital A (X'F09D9080'). The following similar queries return different answers:
-- Query:                                               -- Returns the value:
SELECT CHARACTER_LENGTH(C1,CODEUNITS32) FROM T1;        -- 1
SELECT CHARACTER_LENGTH(C1,CODEUNITS16) FROM T1;        -- 2
SELECT CHARACTER_LENGTH(C1,OCTETS) FROM T1;             -- 4
Example 2: Assume that C1 is a VARCHAR(128) column, encoded in Unicode UTF-8, and that table T1 contains one row with the value of the mathematical bold capital A (X'F09D9080'). The following similar queries return different answers.
-- Query:                                                -- Returns the value:
SELECT HEX(SUBSTRING(C1,1,1,CODEUNITS32)) FROM T1;      -- X'F09D9080'   
SELECT HEX(SUBSTRING(C1,1,1,CODEUNITS16)) FROM T1;      -- X'20'
SELECT HEX(SUBSTRING(C1,1,2,CODEUNITS16)) FROM T1;      -- X'F09D9080'
SELECT HEX(SUBSTRING(C1,1,1,OCTETS)) FROM T1;           -- X'20'
SELECT HEX(SUBSTR(C1,1,1)) FROM T1;                     -- X'F0'  
The value X'20' is the pad (blank) character.