USUPPLEMENTARY
The USUPPLEMENTARY function returns an integer value that is equal to the index of the first Unicode supplementary character in a character data item argument that is encoded in UTF-8 or UTF-16.
A Unicode supplementary character is a character above U+FFFF, that is, a character outside of the Basic Multilingual Plane (BMP). These characters are encoded in UTF-16 with a surrogate pair (two 16-bit code units), or are encoded in UTF-8 with a 4-byte representation.
The function type is integer.
- argument-1
- Must be of class alphabetic, alphanumeric, or national. argument-1 must
contain valid UTF-8 or UTF-16 data based on its class:
- If argument-1 is of class alphabetic or alphanumeric, it must contain valid UTF-8 data.
- If argument-1 is of class national, it must contain valid UTF-16 data.
- If the contents of argument-1 are not valid Unicode (UTF-8 or UTF-16, depending on class), the returned result is unpredictable.
- If argument-1 contains no supplementary characters, the returned value is zero.
- If argument-1 is of class alphabetic or alphanumeric, the returned value is the byte position of the first UTF-8 supplementary character in argument-1.
- If argument-1 is of class national, the returned value is the index, in UTF-16 encoding units, of the first UTF-16 supplementary character in argument-1.
Example 1
For example, the musical G-clef symbol is represented in UTF-16 Unicode by the surrogate pair nx'D834DD1E', or in UTF-8 Unicode by x'F09D849E'. Thus, for the following COBOL program fragment, the output of both DISPLAY statements is value 3.
01 N pic N(4) value nx'00200020D834DD1E'.
01 X pic X(6) value x'2020F09D849E'.
01 I pic 9.
...
Compute I = function Usupplementary(N)
Display I
Compute I = function Usupplementary(X)
Display I
Example 2
If argument-1 is a UTF-8 encoded item and the UTF-8 argument contains composed characters, the combining characters are counted individually. For example, when encoded in UTF-8, the Unicode character ä can be x'C3A4' or x'61CC88'. With either of the UTF-8 characters in argument-1, the returned values of the USUPPLEMENTARY function are different. See the following table for details.
argument-1 | Unicode encoding | UTF-8 encoding | Returned values of the USUPPLEMENTARY function |
---|---|---|---|
B = ä𡷤K | U+00E4 + U21DE4 + U+004B
(precomposed form,
latin small letter a with diaeresis + latin capital letter K) |
x'C3A4F0A1B7A44B' (ä𡷤K) | USUPPLEMENTARY (B) returns 3 |
U+0061 + U+0308 + U21DE4 + U+004B
(canonical decomposition,
latin small letter a + combining diaeresis + latin capital letter K) |
x'61CC88F0A1B7A44B' (ä𡷤K) | USUPPLEMENTARY (B) returns 4 |