Japanese Shift-JIS Character Mapping

When migrating reports or cubes whose names contain Japanese characters, issues may occur because there is no industry standard for mapping byte sequences from Shift-JIS characters to and from Unicode.

IBM® Cognos® Series 7 PowerPlay® Enterprise Server uses operating-system specific variants of the Shift-JIS multibyte character encoding scheme to store Japanese characters. IBM Cognos Analytics with Watson stores all characters internally in Unicode.

Draft comment:
TomL 09/11/08: This appendix will appear only in Abacab GA doc for now. Originally thought it should appear in mig tools doc as well, but it has not been tested in mig tools.

Problems may arise when migrating from IBM Cognos Series 7 to IBM Cognos Analytics with Watson because translations from Shift-JIS to Unicode and from Unicode back to Shift-JIS are performed by different software. If these translations do not all use the same mapping from Shift-JIS to and from Unicode, report and cube names may not match, resulting in items that fail to migrate or in migrated reports that cannot run.

Encoding mappings may be performed by

the IBM Cognos Series 7 migration service
By default, the IBM Cognos Series 7 migration service uses built-in libraries to encode and decode characters, mapping them between Shift-JIS and Unicode. You may need to reconfigure the mappings.
the IBM Cognos Series 7 PowerPlay Enterprise Server Administration Tool (ppsrvadm)
If you publish IBM Cognos Series 7 PowerPlay content to IBM Cognos Analytics with Watson from this tool, references to the PowerPlay 7 cube and report names are converted to Unicode using the character conversion libraries provided by the Java™ Virtual Machine (JVM) used to launch the tool. When migrating the content to IBM Cognos Analytics with Watson, the IBM Cognos Series 7 migration service must be able to reconvert the cube and report names to Shift-JIS and back to Unicode using the same set of mappings.
file transfer programs used to move files from one server to another
If you transfer cubes and reports from one server to another and the underlying file system's encoding has changed in the process, then you may be impacted by the character mapping chosen by the file transfer program that you used. For example, when migrating content from an IBM Cognos Series 7 server on the Solaris operating system that uses the Japanese locale JP.PCK, file names are stored on disk using the Solaris operating system's variant of Shift-JIS. When you transfer these files to a new server that is using a Unicode-based locale, you may be impacted by the character mapping that the file transfer program used for the transfer.
operating system API functions used to read and write files
If the file system used by your IBM Cognos Series 7 server uses a character set that is different than that used in the locale in which your IBM Cognos Series 7 PowerPlay Enterprise Server is running, then you may be impacted by the character mapping that is chosen by the file system. For example, if IBM Cognos Series 7 PowerPlay Enterprise Server is running on Windows with an NTFS file system in the Japanese locale, then PowerPlay is running in Window's CodePage 932, which is the Microsoft variant of Shift-JIS. But file names are stored on disk in Unicode. Mapping between the two encodings is performed at run-time.
the IBM Cognos Analytics with Watson server
The IBM Cognos Analytics with Watson server relies on the JVM used to run IBM Cognos Analytics with Watson to perform character mappings. Even if you are using the same JVM vendor for IBM Cognos Analytics with Watson and ppsrvadm, the two servers may map some Shift-JIS characters to different Unicode codepoints.

If any of the encoding points do not employ the same characters mappings, you must either change cube and report names to remove the problem characters or reconfigure characters to make them use the same mapping.

Characters that Cause Problems

The following table describes the Shift-JIS characters that can cause problems. Characters marked with an asterisk (*) are mappings that are rare and it is unlikely that you will encounter them.

Table 1. Shift-JIS characters that can cause problems in migration
JIS bytes	Shift-JIS bytes	Unicode codepoints	Description
0x5C	0x5C	U+005C u+00A5	Reverse solidus Yen sign
0x7E	0x7E	U+007E U+203E	Tilde Overline
0x2131	0x8150	U+203E* U+FFE3	Overline Full width macron
0x213D	0x815C	U+2014 U+2015	Em dash Horizontal bar
0x2140	0x815F	U+005C* U+FF3C	Reverse solidus Full width reverse solidus
0x2141	0x8160	U+301C U+FF5E	Wave dash Full width tilde
0x2142	0x8161	U+2016 U+2225	Double vertical line parallel to
0x215D	0x817C	U+2212 U+FF0D	Minus sign Full width hyphen-minus
0x216F	0x818F	U+00A5* U+FFE5	Yen sign Full width yen sign
0x2171	0x8191	U+00A2 U+FFE0	Cent sign Fullwidth cent sign
0x2172	0x8192	U+00A3 U+FFE1	Pound sign Full width pound sign
0x224C	0x81CA	U+00AC U+FFE2	Not sign Full width not sign