Double-Byte Character Set Support

Optim™ processes Double-Byte Character Set (DBCS) data and provides DBCS users with all Optim capabilities that are available to Extended Binary Coded Decimal Interchange Code (EBCDIC) users.

Optim DBCS support assumes EBCDIC DBCS data. EBCDIC DBCS requires exactly two bytes for every DBCS character. Optim DBCS support assumes a single DBCS language, including pure DBCS, and Mixed Single-Byte Character Set (SBCS) and DBCS.

In mixed character strings (strings that contain both SBCS and DBCS characters), two special control characters indicate the start and end of a DBCS substring. Shift Out (X'0E') indicates the start and Shift In (X'0F') indicates the end of the DBCS substring.

DB2® translates all DB2 character data between the internal DB2 table Coded Character Set Identifier (CCSID) and the external application (Optim) CCSID. All data that Optim processes remains in the external Optim CCSID.

Note: To eliminate problems with round-trip character translation, the Optim installer should ensure that the external Optim CCSIDs match the internal DB2 CCSIDs for DBCS data.

Optim Archive and Extract Files preserve the encoding scheme and CCSID. Optim warns users if incompatible CCSIDs could cause problems in storing data. Site and User options (“Allow Mismatched CCSIDs”) indicate the action Optim should take when the CCSID of a source column does not match that of a target column and when the CCSID of the terminal does not match that of the DB2 subsystem.

DBCS Data Types

Optim supports the following DBCS data types. The conversion and mapping rules for the DBCS data types are the same as the corresponding DB2 rules.

Functions that Support DBCS Data

Optim supports DBCS data in the following functional areas:

Definitions

The following definitions explain terms used in this section:

Coded Character Set Identifier (CCSID)
A 16-bit number that uniquely identifies a coded representation of graphic characters. It designates an encoding scheme identifier and one or more pairs that consist of a character set identifier and an associated code page identifier.
Double-Byte Character Set (DBCS)
A set of characters, which are used by national languages such as Japanese and Chinese, that have more symbols than can be represented by a single byte. Each character is 2 bytes in length.
Graphic String
A string consisting of double-byte EBCDIC characters that are not stored with Shift Out and Shift In characters.
Multi-Byte Character Set (MBCS)
A character set that represents single characters with more than a single byte. UTF-8 is an example of an MBCS. Characters in UTF-8 can range from 1 to 4 bytes in DB2.
Single-Byte Character Set (SBCS)
A set of characters in which each character is represented by a single byte.