COBOL statements and UTF-8 data items

About this task

The following COBOL statements directly support UTF-8 items:

ALLOCATE and FREE
UTF-8 data items declared in the LINKAGE SECTION can be dynamically allocated and freed.
EVALUATE and IF
The comparison of UTF-8 data items in conditions is supported. Comparisons of UTF-8 data items are done by using binary byte-by-byte comparisons.

UTF-8 data items can only be compared with items of class alphabetic, class alphanumeric, class national, or class UTF-8.

Conversions from EBCDIC to UTF-8 and UTF-16 to UTF-8 are done automatically where necessary during comparisons.

INITIALIZE
The category default for UTF-8 data items is x'20', a UTF-8 space.
JSON GENERATE and JSON PARSE
Data items of category UTF-8 can be used as key parts for JSON GENERATE and JSON PARSE operations.
MERGE and SORT
Data items of category UTF-8 can be used as key parts for sort and merge operations.

All comparisons of UTF-8 data in COBOL are done by using a binary, byte-by-byte comparison. This should produce the same ordering for a set of UTF-8 strings as the corresponding set of NATIONAL strings representing the same Unicode code points, assuming all code points in the strings are from the Basic Multilingual Plane (BMP).

MOVE
Basic move rules:
  • A category UTF-8 sender can be moved only to an item of class and category national, UTF-8, numeric, numeric edited, external floating-point, and internal floating-point.
  • A category UTF-8 item can only receive an item of class alphabetic, class alphanumeric, class national, or class UTF-8.
    Note: This includes items such as numeric-edited, alphanumeric-edited, national-edited, and national-numeric-edited.
Padding and truncation, where needed, are always done at the UTF-8 character (that is, Unicode code point) level.
STRING and UNSTRING
Data is transferred from the sending fields to the receiving field using the rules of the MOVE statement for elementary UTF-8-to-UTF-8 moves.
The following COBOL statements do not support UTF-8 arguments:
  • ACCEPT
  • INSPECT
  • XML GENERATE
  • XML PARSE
Tip: If you need to use a UTF-8 data item with a statement such as INSPECT that does not yet support UTF-8 data items natively, you can always move the UTF-8 data item to a national data item first and use the national item in the statement instead.