Alternative Unicode conversion table for CCSID 954

There are several IBM® coded character set identifiers (CCSIDs) for Japanese code pages. CCSID 954 is registered as the Japanese EUC code page. CCSID 954 is a common encoding for Japanese Linux® and UNIX platforms. When using Microsoft ODBC applications to connect to a Db2® database using CCSID 954, you might encounter potential problems when converting data in CCSID 954 to Unicode. The problems are the result of differences between IBM's code page conversion table and Microsoft's code page conversion table.

The following list of characters, when converted from CCSID 954 to Unicode, will result in different code points depending on which conversion table (IBM or Microsoft) is used. For these characters, the IBM conversion table conforms to the character names as specified in the Japanese Industry Standard (JIS) JISX0208, JISX0212, and JISX0221.

Table 1. CCSID 954 to Unicode code point conversion
EUC-JP code point (character name) IBM primary code point (Unicode name) Microsoft primary code point (Unicode name)
X'A1BD' (EM Dash) U+2014 (EM Dash) U+2015 (Horizontal Bar)
X'A1C1' (Wave Dash) U+301C (Wave Dash) U+FF5E (Fullwidth Tilde)
X'A1C2' (Double vertical line) U+2016 (Double vertical line) U+2225 (Parallel To)
X'A1DD' (Minus sign) U+2212 (Minus sign) U+FF0D (Fullwidth hyphen-minus)
X'8FA2C3' (Broken bar) U+00A6 (Broken bar) U+FFE4 (Fullwidth broken bar)

For example, the character EM dash with the CCSID 954 code point of X'A1BD' is converted to the Unicode code point U+2014 when using the IBM conversion table, but is converted to U+2015 when using the Microsoft conversion table. This can create potential problems for Microsoft ODBC applications because they would treat U+2014 as an invalid code point. To avoid these potential problems, you need to replace the default IBM conversion table from CCSID 954 to Unicode with the alternate Microsoft conversion table provided by the Db2 database manager.