Character string encoding schemes
The method of representing DBCS and MBCS characters within a mixed string differs among the encoding schemes.
- Bit data
- Data that is not associated with a coded character set and, therefore,
is never converted. The CCSID for bit data is X'FFFF' (65535).
The bytes do not represent characters.
Bit data is a form of character data. The pad character is a blank for assignments to bit data; the pad character is X'00' for assignments to binary data. It is recommended that binary data be used instead of character for bit data.
If both operands in a predicate are EBCDIC, both operands are padded with X'40'. Otherwise, both operands are padded with X'20'. For example, if both operands are ASCII, or if one operand is ASCII and the other operand is EBCDIC, both are padded with X'20'.
- SBCS data
- Data in which every character is represented by a single byte. Each SBCS string has an associated CCSID. If necessary, an SBCS string is converted before it is used in an operation with a character string that has a different CCSID.
- Mixed data
- Data that can contain a mixture of characters from a single-byte
character set (SBCS) and a multiple-byte character set (MBCS). Each
mixed string has an associated CCSID. If necessary, a mixed string
is converted before an operation with a character string that has
a different CCSID. If a mixed data string contains an MBCS character,
it cannot be converted to SBCS data.
EBCDIC mixed data can contain shift characters, which are not MBCS data.
When the encoding scheme is Unicode or the DB2® installation is defined to support mixed data, DB2 recognizes MBCS sequences within mixed data string when performing character sensitive operations. These operations include parsing, character conversion, and the pattern matching specified by the LIKE predicate.
The method of representing DBCS and MBCS characters within a mixed string differs among the encoding schemes.
- ASCII reserves a set of code points for SBCS characters and another set as the first half of DBCS characters. When it encounters the first half of a DBCS character, the system reads the next byte in order to obtain the complete character.
- EBCDIC makes use of two special code points:
- A shift-out character (X'0E') to introduce a string of DBCS characters.
- A shift-in character (X'0F') to end a string of DBCS characters.
- UTF-8 is a varying-length encoding of byte sequences. The high bits indicate the part of the sequence to which a byte belongs. The first byte indicates the number of bytes to follow in a byte sequence.