Character strings
A character string is a sequence of bytes. The length of the string is the number of bytes in the sequence. If the length is zero, the value is called the empty string. The empty string should not be confused with the null value.
Default CCSIDs for character strings
The value of the field MIXED DATA (on installation panel DSNTIPF) determines the default CCSIDs for a character string.
Encoding scheme | Value of MIXED DATA field | Default attribute |
---|---|---|
ASCII or EBCDIC | NO | Character: SBCS The value of the ASCII CCSID or EBCDIC CCSID field on installation panel determines the system CCSID for SBCS data. |
ASCII or EBCDIC | YES | Character: MIXED The value of the ASCII CCSID or EBCDIC CCSID field on installation panel DSNTIPF determines the system CCSID for SBCS data, MIXED, and graphic data. |
Unicode | Not applicable | Character: MIXED The CCSIDs are:
|
The MIXED DATA field does not apply to Unicode columns in EBCDIC tables. Those columns follow the same rules that are shown for the Unicode encoding scheme in the previous table. For more information, see Unicode columns in EBCDIC tables.
Fixed-length character strings
When fixed-length character string distinct types, columns, and variables are defined, the length attribute is specified, and all values have the same length. For a fixed-length character string, the length attribute must be in the range 1–255 inclusive.
Varying-length character strings
The types of varying-length character strings are VARCHAR and character large object (CLOB). A CLOB is a type of LOB. A CLOB column is useful for storing large amounts of character data, such as documents written with a single character set.
When varying-length character strings, distinct types, columns, and variables are defined, the maximum length is specified and this length becomes the length attribute except for C NUL-terminated strings. Actual values might have a smaller value. For varying-length character strings, the length specifies the number of bytes.
For a VARCHAR string, the length attribute must be in the range 1–32704. For a VARCHAR column, the maximum for the length attribute is determined by the record size that is associated with the table, as described in Maximum record size the description of the CREATE TABLE statement. For a CLOB string, the length attribute must be in the range 1–2147483647 inclusive. For more information about CLOBs, see Large objects (LOBs).
Character string variables
- Fixed-length character string variables can be used in all languages except REXX and Java™. In C, CHAR string variables are limited to a length of 1.
- Varying-length character string variables can be used in all host languages with the following exceptions:
- Fortran: varying-length non-LOB character strings cannot be used.
- Assembler, C, and COBOL: varying-length non-LOB strings are simulated as described in the section for each language in Embedded SQL programming. In C, NUL-terminated strings can also be used.
- REXX: CLOBs and DBCLOBs cannot be used.
Character string encoding schemes
The method of representing DBCS and MBCS characters within a mixed string differs among the encoding schemes.
- Bit data
- Data that is not associated with a coded character set and, therefore, is never converted. The CCSID for bit data is X'FFFF' (65535). The bytes do not represent characters.
Bit data is a form of character data. The pad character is a blank for assignments to bit data; the pad character is X'00' for assignments to binary data. It is recommended that binary data be used instead of character for bit data.
If both operands in a predicate are EBCDIC, both operands are padded with X'40'. Otherwise, both operands are padded with X'20'. For example, if both operands are ASCII, or if one operand is ASCII and the other operand is EBCDIC, both are padded with X'20'.
- SBCS data
- Data in which every character is represented by a single byte. Each SBCS string has an associated CCSID. If necessary, an SBCS string is converted before it is used in an operation with a character string that has a different CCSID.
- Mixed data
- Data that can contain a mixture of characters from a single-byte character set (SBCS) and a multiple-byte character set (MBCS). Each mixed string has an associated CCSID. If necessary, a mixed string is converted before an operation with a character string that has a different CCSID. If a mixed data string contains an MBCS character, it cannot be converted to SBCS data.
EBCDIC mixed data can contain shift characters, which are not MBCS data.
When the encoding scheme is Unicode or the Db2 installation is defined to support mixed data, Db2 recognizes MBCS sequences within mixed data string when performing character sensitive operations. These operations include parsing, character conversion, and the pattern matching specified by the LIKE predicate.
The method of representing DBCS and MBCS characters within a mixed string differs among the encoding schemes.
- ASCII reserves a set of code points for SBCS characters and another set as the first half of DBCS characters. When it encounters the first half of a DBCS character, the system reads the next byte in order to obtain the complete character.
- EBCDIC makes use of two special code points:
- A shift-out character (X'0E') to introduce a string of DBCS characters.
- A shift-in character (X'0F') to end a string of DBCS characters.
- UTF-8 is a varying-length encoding of byte sequences. The high bits indicate the part of the sequence to which a byte belongs. The first byte indicates the number of bytes to follow in a byte sequence.
Examples of character encoding schemes
The same mixed date character string can be represented as character and hexadecimal data in different encoding schemes.
Data type and encoding scheme | Character representation | Hexadecimal representation (with spaces separating each character) |
---|---|---|
9 bytes in ASCII |
|
8CB3 67 65 6E 8B43 6B 69 |
13 bytes in EBCDIC |
|
0E 4695 0F 87 85 95 0E 45B9 0F 92 89 |
11 bytes in Unicode UTF-8 |
|
E58583 67 65 6E E6B097 6B 69 |
Because of the differences of the representation of mixed data strings in ASCII, EBCDIC, and Unicode, mixed data is not transparently portable. To minimize the effects of these differences, use varying-length strings in applications that require mixed data and operate on ASCII, EBCDIC, and Unicode data.
String units specifications
The ability to specify string units for certain built-in functions and on the CAST specification allows you to process string data in a more "character-based manner" than a "byte-based manner". The string unit determines the length in which the operation is to occur. For more information, see String unit specifications.