Japanese Shift-JIS Character Mapping

When migrating reports or cubes whose names contain Japanese characters, issues may occur because there is no industry standard for mapping byte sequences from Shift-JIS characters to and from Unicode.

IBM® Cognos® Series 7 PowerPlay® Enterprise Server uses operating-system specific variants of the Shift-JIS multibyte character encoding scheme to store Japanese characters. IBM Cognos Analytics with Watson stores all characters internally in Unicode.

Draft comment:
TomL 09/11/08: This appendix will appear only in Abacab GA doc for now. Originally thought it should appear in mig tools doc as well, but it has not been tested in mig tools.

Problems may arise when migrating from IBM Cognos Series 7 to IBM Cognos Analytics with Watson because translations from Shift-JIS to Unicode and from Unicode back to Shift-JIS are performed by different software. If these translations do not all use the same mapping from Shift-JIS to and from Unicode, report and cube names may not match, resulting in items that fail to migrate or in migrated reports that cannot run.

Encoding mappings may be performed by

  • the IBM Cognos Series 7 migration service

    By default, the IBM Cognos Series 7 migration service uses built-in libraries to encode and decode characters, mapping them between Shift-JIS and Unicode. You may need to reconfigure the mappings.

  • the IBM Cognos Series 7 PowerPlay Enterprise Server Administration Tool (ppsrvadm)

    If you publish IBM Cognos Series 7 PowerPlay content to IBM Cognos Analytics with Watson from this tool, references to the PowerPlay 7 cube and report names are converted to Unicode using the character conversion libraries provided by the Java™ Virtual Machine (JVM) used to launch the tool. When migrating the content to IBM Cognos Analytics with Watson, the IBM Cognos Series 7 migration service must be able to reconvert the cube and report names to Shift-JIS and back to Unicode using the same set of mappings.

  • file transfer programs used to move files from one server to another

    If you transfer cubes and reports from one server to another and the underlying file system's encoding has changed in the process, then you may be impacted by the character mapping chosen by the file transfer program that you used. For example, when migrating content from an IBM Cognos Series 7 server on the Solaris operating system that uses the Japanese locale JP.PCK, file names are stored on disk using the Solaris operating system's variant of Shift-JIS. When you transfer these files to a new server that is using a Unicode-based locale, you may be impacted by the character mapping that the file transfer program used for the transfer.

  • operating system API functions used to read and write files

    If the file system used by your IBM Cognos Series 7 server uses a character set that is different than that used in the locale in which your IBM Cognos Series 7 PowerPlay Enterprise Server is running, then you may be impacted by the character mapping that is chosen by the file system. For example, if IBM Cognos Series 7 PowerPlay Enterprise Server is running on Windows with an NTFS file system in the Japanese locale, then PowerPlay is running in Window's CodePage 932, which is the Microsoft variant of Shift-JIS. But file names are stored on disk in Unicode. Mapping between the two encodings is performed at run-time.

  • the IBM Cognos Analytics with Watson server

    The IBM Cognos Analytics with Watson server relies on the JVM used to run IBM Cognos Analytics with Watson to perform character mappings. Even if you are using the same JVM vendor for IBM Cognos Analytics with Watson and ppsrvadm, the two servers may map some Shift-JIS characters to different Unicode codepoints.

If any of the encoding points do not employ the same characters mappings, you must either change cube and report names to remove the problem characters or reconfigure characters to make them use the same mapping.

Characters that Cause Problems

The following table describes the Shift-JIS characters that can cause problems. Characters marked with an asterisk (*) are mappings that are rare and it is unlikely that you will encounter them.

Table 1. Shift-JIS characters that can cause problems in migration

JIS bytes

Shift-JIS bytes

Unicode codepoints

Description

0x5C

0x5C

U+005C

u+00A5

Reverse solidus

Yen sign

0x7E

0x7E

U+007E

U+203E

Tilde

Overline

0x2131

0x8150

U+203E*

U+FFE3

Overline

Full width macron

0x213D

0x815C

U+2014

U+2015

Em dash

Horizontal bar

0x2140

0x815F

U+005C*

U+FF3C

Reverse solidus

Full width reverse solidus

0x2141

0x8160

U+301C

U+FF5E

Wave dash

Full width tilde

0x2142

0x8161

U+2016

U+2225

Double vertical line

parallel to

0x215D

0x817C

U+2212

U+FF0D

Minus sign

Full width hyphen-minus

0x216F

0x818F

U+00A5*

U+FFE5

Yen sign

Full width yen sign

0x2171

0x8191

U+00A2

U+FFE0

Cent sign

Fullwidth cent sign

0x2172

0x8192

U+00A3

U+FFE1

Pound sign

Full width pound sign

0x224C

0x81CA

U+00AC

U+FFE2

Not sign

Full width not sign