Unicode UTF-16 data from character constants
For Character Unicode (CU) constants the value is converted to the Unicode CCSID specified by the CU option, which may be 1200 (UTF-16BE), 1202 (UTF-16LE) or 1208 (UTF-8). Any paired occurrences of ampersands and apostrophes are converted to a single occurrence of such a character prior to conversion. If necessary the value is padded with EBCDIC spaces on the right (X'40'). The assembler then maps each EBCDIC character into its Unicode equivalent and converts the result to the format specified by the CU option. For UTF-16, each character is translated to two bytes and for UTF-8 each character translates to up to three bytes: ASCII characters require one byte, accented letters typically require two bytes and the Euro symbol requires three bytes.
For Unicode conversion, it is recommended to use the option CODEPAGE(LOCAL) to indicate that no CODEPAGE table is to be loaded and that the source CCSID is specified by the EBCDIC option. In this case a standard internal conversion table is used which correctly converts all characters from the source EBCDIC CCSID. Otherwise, the conversion mapping is defined by the selected CODEPAGE table, identified by its source CCSID. If that matches the CCSID specified by the EBCDIC option (or its Euro equivalent) the conversion is applied directly to the source code, otherwise the source is first converted to the CCSID specified by the initial CE option, if different. If the CODEPAGE CCSID matches neither the EBCDIC option nor the CE option, a warning is issued for the first use of a type CU self-defining term or constant.
UA DC CU'UTF-16' object code X' 005500540046002D00310036'
UB DC CUL4'L' object code X' 004C0020'
UC DC CUL2'XYZ' object code X' 0058'