Alternative Unicode conversion table for CCSID 5039

There are several IBM® coded character set identifiers (CCSIDs) for Japanese code pages. CCSID 943 is registered as the Microsoft Japanese Windows Shift-JIS code page. CCSID 5039 contains only Japanese Industry Standard (JIS) characters, and does not have any vendor-defined characters. When using Microsoft ODBC applications, you might encounter potential problems when converting data in CCSID 5039 to Unicode. The problems are the result of differences between IBM's code page conversion table and Microsoft's code page conversion table.

The following list of characters, when converted from CCSID 5039 to Unicode, will result in different code points depending on which conversion table (IBM or Microsoft) is used. For these characters, the IBM conversion table conforms to the character names as specified in the Japanese Industry Standard (JIS) JISX0208, and JISX0221.

Table 1. CCSID 5039 to Unicode code point conversion
Shift-JIS code point (character name)	IBM primary code point (Unicode name)	Microsoft primary code point (Unicode name)
X'815C' (EM Dash)	U+2014 (EM Dash)	U+2015 (Horizontal Bar)
X'8160' (Wave Dash)	U+301C (Wave Dash)	U+FF5E (Fullwidth Tilde)
X'8161' (Double vertical line)	U+2016 (Double vertical line)	U+2225 (Parallel To)
X'817C' (Minus sign)	U+2212 (Minus sign)	U+FF0D (Fullwidth hyphen-minus)

For example, the character EM dash with the CCSID 5039 code point of X'815C' is converted to the Unicode code point U+2014 when using the IBM conversion table, but is converted to U+2015 when using the Microsoft conversion table. This can create potential problems for Microsoft ODBC applications because they would treat U+2014 as an invalid code point. To avoid these potential problems, you need to replace the default IBM conversion table from CCSID 5039 to Unicode with the alternate Microsoft conversion table provided by the Db2® database manager.