Support for non-Unicode single and multibyte character sets

The compiler now supports non-Unicode multibyte character sets (such as SJIS, EUC-JP, EUC-KR) via the -fexec-charset and -finput-charset options.
  • -finput-charset specifies the input character set used to interpret source files.
  • -fexec-charset specifies the execution character set used to encode character and string literals at runtime.

The following lists the recommended character encoding names to specify for the SJIS, EUC-JP and EUC-KR AIX locales:

AIX Locale Character Encoding Name Notes
Ja_JP IBM-943C Shift-JIS encoding.
ja_JP IBM-eucJP Japanese EUC.
ko_KR IBM-eucKR Korean EUC.

Migration considerations for AIX JIS characters

Open XL for AIX uses ICU (International Components for Unicode) to provide mappings and conversions for character sets. When using -fexec-charset or -finput-charset with IBM-943C or IBM-eucJP, differences exist between ICU’s mappings and those used by legacy AIX compilers. These differences primarily affect Unicode escape sequences (UCNs) for JIS characters that have multiple possible Unicode equivalents. A number of affected characters are listed below:

Shift-JIS Code Point ICU Mapping Legacy Compiler Mapping
X’8160’ ~ FULLWIDTH TILDE (U+FF5E) 〜 WAVE DASH (U+301C)
X’817C’ - FULLWIDTH HYPHEN‑MINUS (U+FF0D) − MINUS SIGN (U+2212)
X’FA55’ ¦ FULLWIDTH BROKEN BAR (U+FFE4) ¦ BROKEN BAR (U+00A6)
X’815C’ ― HORIZONTAL BAR (U+2015) — EM DASH (U+2014)
X’8161’ ∥ PARALLEL TO (U+2225) ‖ DOUBLE VERTICAL LINE (U+2016)

For these characters, conversion to and from SJIS or EUC-JP is performed using the ICU mapping, whereas the legacy AIX compiler maps them to alternative Unicode values. Despite this difference, round-tripping is maintained. In particular, AIX wctomb accepts both ICU and legacy mappings as valid wchar_t input values.

Related information