USUPPLEMENTARY

The USUPPLEMENTARY function returns an integer value that is equal to the index of the first Unicode supplementary character in a character data item argument that is encoded in UTF-8 or UTF-16.

A Unicode supplementary character is a character above U+FFFF, that is, a character outside of the Basic Multilingual Plane (BMP). These characters are encoded in UTF-16 with a surrogate pair (two 16-bit code units), or are encoded in UTF-8 with a 4-byte representation.

The function type is integer.

Format

Read syntax diagramSkip visual syntax diagramFUNCTION USUPPLEMENTARY(argument-1)
argument-1
Must be of class alphabetic, alphanumeric, or national. argument-1 must contain valid UTF-8 or UTF-16 data based on its class:
  • If argument-1 is of class alphabetic or alphanumeric, it must contain valid UTF-8 data.
  • If argument-1 is of class national, it must contain valid UTF-16 data.
The returned value is an integer, which differs based on the argument-1 value, and is 9-digit if LP(32) is in effect or 18-digit if LP(64) is in effect:
  • If the contents of argument-1 are not valid Unicode (UTF-8 or UTF-16, depending on class), the returned result is unpredictable.
  • If argument-1 contains no supplementary characters, the returned value is zero.
  • If argument-1 is of class alphabetic or alphanumeric, the returned value is the byte position of the first UTF-8 supplementary character in argument-1.
  • If argument-1 is of class national, the returned value is the index, in UTF-16 encoding units, of the first UTF-16 supplementary character in argument-1.

Example 1

For example, the musical G-clef symbol is represented in UTF-16 Unicode by the surrogate pair nx'D834DD1E', or in UTF-8 Unicode by x'F09D849E'. Thus, for the following COBOL program fragment, the output of both DISPLAY statements is value 3.

01 N pic N(4) value nx'00200020D834DD1E'.
01 X pic X(6) value x'2020F09D849E'.
01 I pic 9.
...
Compute I = function Usupplementary(N)
Display I
Compute I = function Usupplementary(X)
Display I

Example 2

If argument-1 is a UTF-8 encoded item and the UTF-8 argument contains composed characters, the combining characters are counted individually. For example, when encoded in UTF-8, the Unicode character ä can be x'C3A4' or x'61CC88'. With either of the UTF-8 characters in argument-1, the returned values of the USUPPLEMENTARY function are different. See the following table for details.

Table 1. Returned values of the USUPPLEMENTARY function
argument-1 Unicode encoding UTF-8 encoding Returned values of the USUPPLEMENTARY function
B = ä𡷤K
U+00E4 + U21DE4 + U+004B
(precomposed form,
latin small letter a with diaeresis + latin capital letter K)
x'C3A4F0A1B7A44B' (ä𡷤K) USUPPLEMENTARY (B) returns 3
U+0061 + U+0308 + U21DE4 + U+004B
(canonical decomposition,
latin small letter a + combining diaeresis + latin capital letter K)
x'61CC88F0A1B7A44B' (ä𡷤K) USUPPLEMENTARY (B) returns 4