Processing UTF-8 data using UTF-16 (national) data types

As an alternative to the recommended method of processing UTF-8 data using USAGE UTF-8 data items, you can also process UTF-8 data by storing it in alphanumeric data items and then converting it to UTF-16 in a national data item. After processing the national data, convert it back to UTF-8 for output. For the conversions, use the intrinsic functions NATIONAL-OF and DISPLAY-OF, respectively. Use code page 1208 for UTF-8 data.

About this task

As an alternative to the recommended method of processing UTF-8 data using USAGE UTF-8 data items, you can also process UTF-8 data by storing it in alphanumeric data items and then converting it to UTF-16 in a national data item.

National data is encoded in UTF-16, which uses one encoding unit for almost all commonly encountered characters. With this property, you can use string operations such as reference modification on the national data. If it is more convenient to retain the UTF-8 encoding, use the Unicode intrinsic functions to assist with processing the data. For details, see Using intrinsic functions to process UTF-8 encoded data.

Take the following steps to convert ASCII or EBCDIC data to UTF-8:

Procedure

  1. Use the function NATIONAL-OF to convert the ASCII or EBCDIC string to a national (UTF-16) string.
  2. Use the function DISPLAY-OF to convert the national string to UTF-8.

Results

The following example converts Greek EBCDIC data to UTF-8:

This image shows sample code for converting Greek EBCDIC data to UTF-8.Link to detail.

Usage note: Use care if you use reference modification to refer to data encoded in UTF-8. UTF-8 characters are encoded with a varying number of bytes per character. Avoid operations that might split a multibyte character.