IBM-943 and IBM-932
Each of the Japanese IBM® PC code sets are an encoding consisting of single-byte and multibyte coded characters. The encoding is based on the IBM PC code set and places the JIS characters in shifted positions. This is referred to as Shift-JIS or SJIS.
IBM-943 is a newer code set for the Japanese locale than IBM-932. IBM-943 is a compatible code set for the Japanese Microsoft™ Windows™ environment. This code set is known as 1983 ordered shift-JIS. The differences between IBM-932 and IBM-943 are as follows:
- Previous JIS sequence (1978 ordered) is applied for IBM-932 while newer JIS sequence (1983 ordered) is applied for IBM-943.
- NEC selected characters are added to IBM-943.
- NEC's IBM selected characters are added to IBM-943.
The IBM-932 code set consists of the following character sets:
| Character set | Description |
|---|---|
| JISCII | JISX0201 Graphic Left character set |
| JISX0201.1976 | Katakana/Hiragana Graphic Right character set |
| JISX0208.1983 | Kanji level 1 and 2 character sets |
| IBM-udcJP | IBM user-definable characters |
The IBM-943 code set consists of the following character sets:
| Character set | Description |
|---|---|
| JISCII | JISX0201 Graphic Left character set |
| JISX0201.1976 | Katakana/Hiragana Graphic Right character set |
| JISX0208.1990 | Kanji level 1 and 2 character sets |
| IBM-udcJP | IBM user-definable characters and NEC's IBM selected characters and NEC selected characters |
The first byte of each character is used to determine the number of bytes for a given character. The values 0x20-0x7e and 0xa1-oxdf are used to encode JISX0201 characters, with exceptions. The positions 0x81-0x9f and 0xe0-0xfc are reserved for use as the first byte of a multibyte character. The JISX0208 characters are mapped to the multibyte values starting at 0x8140. The second byte of a multibyte character can have any value. The Shift-JIS table shows where these characters are located on the code set.
| Character Encoding | Code Point | Description | Count |
|---|---|---|---|
| 000xxxxx | 00–1f | Controls | 32 |
| 00100000 | 20 | Space | 1 |
| 0xxxxxxx | 21–7E | 7-bit ASCII | 94 |
| 01111111 | 7F | Delete | 1 |
| 10000000 | 80 | Undefined | 1 |
| 100xxxxx 01xxxxxx | [81–9F] [40–7E] | Double byte | 1953 |
| 100xxxxx 1xxxxxxx | [81–9F] [80–FC] | Double byte | 3975 |
| 10100000 | A0 | Undefined | 1 |
| 1xxxxxxx | A1–DF | 7-bit single byte | 63 |
| 111xxxxx 01xxxxxx | [E0–FC] [40–7E] | Double byte | 1827 |
| 111xxxxx 1xxxxxxx | [E0–FC] [80–FC] | Double byte | 3625 |
| 11111101 | FD | Undefined | 1 |
| 11111110 | FE | Undefined | 1 |
| 11111111 | FF | Undefined | 1 |
The following table shows the DBCS portion of IBM-943.
| Code Point | Description |
|---|---|
| [81–84] [40–7E] and [81–84] [80–F0] | JIS X 0208 (Non-Kanji) |
| [87] [40–7E] and [87] [80–F0] | NEC selected characters |
| [89–98] [40–7E] and [88] [9F-F0], [89–97] [80–F0], [98] [80–9F] | JIS X0208 (Level-1 Kanji) |
| [99–9F] [40–7E] and [98] [9F-F0], [99–9F] [80–F0] | JIS X0208 (Level-2 Kanji) |
| [E0–EA] [40–7E] and [E0–EA] [80–F0] | JIS X0208 (Level-2 Kanji) |
| [ED–EE] [40–7E] and [ED–EE] [80–F0] | NEC IBM selected characters |
| [F0–F9] [40–7E] and [F0–F9] [80–F0] | User-defined characters |
| [FA] [40–5C] | IBM selected characters (non-Kanji) |
| [FA] [5C-7E], [FB-FC] [40–7E] and [FA-FC] [80–F0] | IBM selected characters (Kanji) |
The following table shows the DBCS portion of IBM-932.
| Code Point | Description |
|---|---|
| [81–98] [40–7E] and [81–97] [80–FC], [98] [80–9F] | JIS X 0208 (Level-1 Kanji) |
| [99–9F] [40–7E] and [98] [9F-FC], [99–9F] [80–FC] | JIS X 0208 (Level-2 Kanji) |
| [E0–EF] [40–7E] and [E0–EF] [80–FC] | JIS X 0208 (Level-2 Kanji) |
| [F0–F9] [40–7E] and [F0–F9] [80–FC] | User-defined characters |
| [FA–FC] [40–7E] and [FA–FC] [80–FC] | IBM selected characters |