COBOL statements and UTF-8 data items
About this task
The following COBOL statements directly support UTF-8 items:
- ALLOCATE and FREE
- UTF-8 data items declared in the
LINKAGE SECTIONcan be dynamically allocated and freed. - EVALUATE and IF
- The comparison of UTF-8 data items in conditions is supported.
Comparisons of UTF-8 data items are done by using binary byte-by-byte
comparisons.
UTF-8 data items can only be compared with items of class alphabetic, class alphanumeric, class national, or class UTF-8.
Conversions from EBCDIC to UTF-8 and UTF-16 to UTF-8 are done automatically where necessary during comparisons.
- INITIALIZE
- The category default for UTF-8 data items is x'20', a UTF-8 space.
- JSON GENERATE and JSON PARSE
- Data items of category UTF-8 can be used as key parts for JSON GENERATE and JSON PARSE operations.
- MERGE and SORT
- Data items of category UTF-8 can be used as key parts for sort
and merge operations.
All comparisons of UTF-8 data in COBOL are done by using a binary, byte-by-byte comparison. This should produce the same ordering for a set of UTF-8 strings as the corresponding set of NATIONAL strings representing the same Unicode code points, assuming all code points in the strings are from the Basic Multilingual Plane (BMP).
- MOVE
- Basic move rules:
- A category UTF-8 sender can be moved only to an item of class and category national, UTF-8, numeric, numeric edited, external floating-point, and internal floating-point.
- A category UTF-8 item can only receive an item of class alphabetic,
class alphanumeric, class national, or class UTF-8. Note: This includes items such as numeric-edited, alphanumeric-edited, national-edited, and national-numeric-edited.
- STRING and UNSTRING
- Data is transferred from the sending fields to the receiving field using the rules of the MOVE statement for elementary UTF-8-to-UTF-8 moves.
The following COBOL statements do not support UTF-8
arguments:
- ACCEPT
- INSPECT
- XML GENERATE
- XML PARSE
Tip: If you need to use a UTF-8 data item with a statement such as INSPECT that does not
yet support UTF-8 data items natively, you can always move the UTF-8 data item to a national data
item first and use the national item in the statement instead.