Expanding conversion

An expanding conversion occurs when the length of the converted string is greater than that of the source string.

For example, an expanding conversion occurs when an ASCII mixed data string that contains DBCS characters is converted to an EBCDIC mixed data string. The expansion occurs because of the addition of shift-out and shift-in control characters. Expanding conversions can also occur when string data is converted to or from Unicode.

Expanding conversions typically affect European and Asian Pacific languages. For example, the German name Jürgen expands when it is converted from ASCII or EBCDIC to Unicode. Also, Japanese, Korean, and Chinese strings expand when they are converted from ASCII to EBCDIC.

Expanding conversions can have the following effects on Db2:

  • Expanding conversions might cause problems with fixed-length variables. For example, when a fixed-length host variable needs to be converted from ASCII mixed data to EBCDIC mixed data, an error occurs. The problem occurs because the conversion is an expanding conversion, but the host variable is fixed-length. The solution is to use a varying-length string variable with a maximum length that is sufficient to contain the expansion.
  • Expanding conversions can affect fixed-length strings. If you use a fixed-length string and an expanding conversion occurs, Db2 truncates the string. Db2 examines the characters that are being truncated to ensure that significant data is not truncated. For example, trailing blanks are insignificant. In this situation, consider using the VARCHAR data type for these strings.
  • Expanding conversions can affect the results of length functions, such as LENGTH, CHARACTER_LENGTH, SUBSTRING, and SUBSTR on the converted string. For CHARACTER_LENGTH and SUBSTRING, use the CODEUNITS16 and CODEUNITS32 options to limit the effects of expanding conversions.
  • Expanding conversions can affect the length of the object names, such as table names and column names. You can avoid these problems by not using special characters in object names.

To determine the worst-case result length of a CCSID conversion, use the following table.

Table 1. Worst case result length of CCSID conversion, where X represents LENGTH(string in bytes)
From CCSID To CCSID
EBCDIC ASCII Unicode
SBCS Mixed DBCS SBCS Mixed DBCS SBCS UTF-8 UTF-16
EBCDIC SBCS X X X*21 X X X*21 X1 X*3 X*2
Mixed X X X*21 X X X*21 X1 X*3 X*2
DBCS X*0.51 X+2 X X*0.51 X X X*0.5 X*1.5 X
ASCII SBCS X X X*21 X X X*21 X1 X*3 X*2
Mixed X X*1.8 X*21 X X X*21 X1 X*3 X*2
DBCS X*0.51 X+2 X X*0.51 X X X*0.5 X*1.5 X
Unicode SBCS X X X*2 X X X*2 X X X*2
UTF-8 X X*1.25 X X X X X X X*2
UTF-16 X*0.5 X+2 X X*0.5 X X X*0.5 X*1.5 X
Note:
  1. Because of the high probability of data loss, IBM® does not provide conversion tables for this combination of two CCSIDs and data subtypes.

Examples

Example
In ASCII CCSID 819, the character Å is represented by the code point X'C5'. In UTF-8 CCSID 1208, this character is represented by X'C385'. Thus, the conversion of the character Å from CCSID 819 to CCSID 1208 is an expanding conversion.
Example
The following table shows a string with Kanji characters and Latin characters in different encoding schemes.
Table 2. Example of a character string in different encoding schemes
Data type and encoding scheme Character representation Hexadecimal representation (with spaces separating each character)
9 bytes in ASCII
Begin figure description. A string consists of a Kanji character, the Latin lowercase characters gen, another Kanji character, and the Latin lowercase characters ki. End figure description.
8CB3 67 65 6E 8B43 6B 69
13 bytes in EBCDIC
Begin figure description. A string is a shift-out, a Kanji character, a shift-in, the characters g e n, a shift-out, a Kanji character, a shift-in, and the characters k i. End figure description.
0E 4695 0F 87 85 95 0E 45B9 0F 92 89
11 bytes in Unicode UTF-8
Begin figure description. A string consists of a Kanji character, the Latin lowercase characters gen, another Kanji character, and the Latin lowercase characters ki. End figure description.
E58583 67 65 6E E6B097 6B 69
If you convert this string from ASCII to EBCDIC, notice that shift-in and shift-out characters are added. This conversion is an example of an expanding conversion. The length increases from 9 bytes to 13 bytes.