Japanese Shift-JIS Character Mapping
When migrating reports or cubes whose names contain Japanese characters, issues may occur because there is no industry standard for mapping byte sequences from Shift-JIS characters to and from Unicode.
IBM® Cognos® Series 7 PowerPlay® Enterprise Server uses operating-system specific variants of the Shift-JIS multibyte character encoding scheme to store Japanese characters. IBM Cognos Analytics with Watson stores all characters internally in Unicode.
Problems may arise when migrating from IBM Cognos Series 7 to IBM Cognos Analytics with Watson because translations from Shift-JIS to Unicode and from Unicode back to Shift-JIS are performed by different software. If these translations do not all use the same mapping from Shift-JIS to and from Unicode, report and cube names may not match, resulting in items that fail to migrate or in migrated reports that cannot run.
Encoding mappings may be performed by
- the IBM Cognos Series 7 migration service
By default, the IBM Cognos Series 7 migration service uses built-in libraries to encode and decode characters, mapping them between Shift-JIS and Unicode. You may need to reconfigure the mappings.
- the IBM Cognos Series 7 PowerPlay Enterprise Server Administration
Tool (ppsrvadm)
If you publish IBM Cognos Series 7 PowerPlay content to IBM Cognos Analytics with Watson from this tool, references to the PowerPlay 7 cube and report names are converted to Unicode using the character conversion libraries provided by the Java™ Virtual Machine (JVM) used to launch the tool. When migrating the content to IBM Cognos Analytics with Watson, the IBM Cognos Series 7 migration service must be able to reconvert the cube and report names to Shift-JIS and back to Unicode using the same set of mappings.
- file transfer programs used to move files from one server to another
If you transfer cubes and reports from one server to another and the underlying file system's encoding has changed in the process, then you may be impacted by the character mapping chosen by the file transfer program that you used. For example, when migrating content from an IBM Cognos Series 7 server on the Solaris operating system that uses the Japanese locale JP.PCK, file names are stored on disk using the Solaris operating system's variant of Shift-JIS. When you transfer these files to a new server that is using a Unicode-based locale, you may be impacted by the character mapping that the file transfer program used for the transfer.
- operating system API functions used to read and write files
If the file system used by your IBM Cognos Series 7 server uses a character set that is different than that used in the locale in which your IBM Cognos Series 7 PowerPlay Enterprise Server is running, then you may be impacted by the character mapping that is chosen by the file system. For example, if IBM Cognos Series 7 PowerPlay Enterprise Server is running on Windows with an NTFS file system in the Japanese locale, then PowerPlay is running in Window's CodePage 932, which is the Microsoft variant of Shift-JIS. But file names are stored on disk in Unicode. Mapping between the two encodings is performed at run-time.
- the IBM
Cognos Analytics with Watson server
The IBM Cognos Analytics with Watson server relies on the JVM used to run IBM Cognos Analytics with Watson to perform character mappings. Even if you are using the same JVM vendor for IBM Cognos Analytics with Watson and ppsrvadm, the two servers may map some Shift-JIS characters to different Unicode codepoints.
If any of the encoding points do not employ the same characters mappings, you must either change cube and report names to remove the problem characters or reconfigure characters to make them use the same mapping.
Characters that Cause Problems
The following table describes the Shift-JIS characters that can cause problems. Characters marked with an asterisk (*) are mappings that are rare and it is unlikely that you will encounter them.
JIS bytes |
Shift-JIS bytes |
Unicode codepoints |
Description |
---|---|---|---|
0x5C |
0x5C |
U+005C u+00A5 |
Reverse solidus Yen sign |
0x7E |
0x7E |
U+007E U+203E |
Tilde Overline |
0x2131 |
0x8150 |
U+203E* U+FFE3 |
Overline Full width macron |
0x213D |
0x815C |
U+2014 U+2015 |
Em dash Horizontal bar |
0x2140 |
0x815F |
U+005C* U+FF3C |
Reverse solidus Full width reverse solidus |
0x2141 |
0x8160 |
U+301C U+FF5E |
Wave dash Full width tilde |
0x2142 |
0x8161 |
U+2016 U+2225 |
Double vertical line parallel to |
0x215D |
0x817C |
U+2212 U+FF0D |
Minus sign Full width hyphen-minus |
0x216F |
0x818F |
U+00A5* U+FFE5 |
Yen sign Full width yen sign |
0x2171 |
0x8191 |
U+00A2 U+FFE0 |
Cent sign Fullwidth cent sign |
0x2172 |
0x8192 |
U+00A3 U+FFE1 |
Pound sign Full width pound sign |
0x224C |
0x81CA |
U+00AC U+FFE2 |
Not sign Full width not sign |
TomL 09/11/08: This appendix will appear only in Abacab GA doc for now. Originally thought it should appear in mig tools doc as well, but it has not been tested in mig tools.