Support for non-Unicode single and multibyte character sets
- -finput-charset specifies the input character set used to interpret source files.
- -fexec-charset specifies the execution character set used to encode character and string literals at runtime.
The following lists the recommended character encoding names to specify for the SJIS, EUC-JP and EUC-KR AIX locales:
| AIX Locale | Character Encoding Name | Notes |
|---|---|---|
Ja_JP |
IBM-943C |
Shift-JIS encoding. |
ja_JP |
IBM-eucJP |
Japanese EUC. |
ko_KR |
IBM-eucKR |
Korean EUC. |
Migration considerations for AIX JIS characters
Open XL for AIX uses ICU (International Components for Unicode) to provide mappings and conversions for character sets. When using -fexec-charset or -finput-charset with IBM-943C or IBM-eucJP, differences exist between ICU’s mappings and those used by legacy AIX compilers. These differences primarily affect Unicode escape sequences (UCNs) for JIS characters that have multiple possible Unicode equivalents. A number of affected characters are listed below:
| Shift-JIS Code Point | ICU Mapping | Legacy Compiler Mapping |
|---|---|---|
| X’8160’ | ~ FULLWIDTH TILDE (U+FF5E) | 〜 WAVE DASH (U+301C) |
| X’817C’ | - FULLWIDTH HYPHEN‑MINUS (U+FF0D) | − MINUS SIGN (U+2212) |
| X’FA55’ | ¦ FULLWIDTH BROKEN BAR (U+FFE4) | ¦ BROKEN BAR (U+00A6) |
| X’815C’ | ― HORIZONTAL BAR (U+2015) | — EM DASH (U+2014) |
| X’8161’ | ∥ PARALLEL TO (U+2225) | ‖ DOUBLE VERTICAL LINE (U+2016) |
For these characters, conversion to and from SJIS or EUC-JP is performed using the ICU mapping, whereas the legacy AIX compiler maps them to alternative Unicode values. Despite this difference, round-tripping is maintained. In particular, AIX wctomb accepts both ICU and legacy mappings as valid wchar_t input values.