Conversion rules for comparisons

When two strings are compared, one of the strings is first converted, if necessary, to the coded character set of the other string. Conversion is necessary only if certain rules apply.

Conversion is necessary only if all of the following are true:

  • The CCSIDs of the two strings are different.
  • Neither CCSID is X'FFFF' (neither string is defined as a binary string).
  • The string selected for conversion is neither null nor empty.
  • The following conversion tables (Table 2 or Table 3) indicate when conversion is necessary.
The string selected for conversion depends on the type of the operands. For the purpose of CCSID determination, string expressions in a statement are divided into 6 types, as described in the following table.
Table 1. Operand types
Type of operand CCSID of the operand type
Columns CCSID from the containing table
String constants CCSID associated with the application encoding scheme. For dynamic statements, this is the CURRENT APPLICATION ENCODING SCHEME special register. For static statements, this is the ENCODING bind option or the APPLICATION ENCODING SCHEME option of the CREATE PROCEDURE or ALTER PROCEDURE statement for native SQL procedures..
Special registers CCSID associated with the application encoding scheme. For dynamic statements, this is the CURRENT APPLICATION ENCODING SCHEME special register. For static statements, this is the ENCODING bind option or the APPLICATION ENCODING SCHEME option of the CREATE PROCEDURE or ALTER PROCEDURE statement for native SQL procedures.
Host variables CCSID specified in the DECLARE VARIABLE statement, associated with the application encoding scheme, or specified in SQLDAID or SQLDA
Derived value based on a column CCSID derived from the source of the derived value. A derived value based on a column is an expression whose source is directly or indirectly based on columns. The CCSID of such an expression is the CCSID derived from its source.

For example:

  • The CCSID of SUBSTR(column_1, 5, length(column_2)) is the CCSID of column_1. Note that the CCSID of column_2 has no influence on the CCSID of SUBSTR.
  • The CCSID of column_1 || 'ABC' is the CCSID of column_1, derived from the rules described in Table 2.
  • The CCSID of column_1 || GX'42C1' is the DBCS CCSID from the CCSID set of column_1, derived from the rules described in Table 2 and Table 3.
  • The CCSID of COALESCE(EBCDIC_column_1, ASCII_column_1) is the UNICODE CCSID, derived from the rules described in Table 2.
  • The CCSID of CAST(string_column_1 AS GRAPHIC(10)) is the DBCS CCSID from the CCSID set of string_column_1.
  • The CCSID of CAST(EBCDIC_string_column_1 AS VARCHAR(10) CCSID UNICODE) is the UNICODE CCSID derived from the rules described in Table 2.
  • The CCSID of CASE WHEN(1=1) THEN '1' ELSE ASCII_column_1 END is the CCSID of ASCII_column_1, derived from the rules described in Table 2.
  • The CCSID of CASE WHEN(1=1) THEN EBCDIC_column_1 ELSE ASCII_column_1 END is the UNICODE CCSID derived from the rules described in Table 2.
  • The CCSID of a scalar fullselect (SELECT column_1 FROM table_1) is the CCSID of column_1.
Derived value not based on a column CCSID derived from the source of the derived value. A derived value not based on a column is an expression whose source is not directly or indirectly based on any column. The CCSID of such an expression is the CCSID derived from its source.
  • For example, the CCSID of SUBSTR('ABDC', 1, length('AB")) is the CCSID of the string constant 'ABCD'. Note that the CCSID of column_1 has no influence on the CCSID of SUBSTR.
  • the CCSID of user_defined_function1(column1) is the output CCSID defined by user_defined_function1.
  • the CCSID of the cast function of distinct type, shape, is the CCSID of distinct type, shape.
  • the CCSID of CURRENT SQLID || UX'0041' is the UNICODE DBCS CCSID, derived from the rules described in Table 2 and Table 3.
  • the CCSID of CAST('abc' as CHAR(10) CCSID UNICODE) is the UNICODE CCSID.

The following table shows which operand supplies the target CCSID set when the comparison is part of an SQL statement involving multiple tables with different CCSID sets.

Table 2. Operand that supplies the CCSID for character conversion
First operand Second operand
Column value String constant Special register Host variable Derived value based on a column Derived value not based on a column
Column value 1 first operand first operand first operandStart of change, 2End of change 1 first operand
String constant second operand 1 1 1 second operand 1
Special register second operand 1 1 1 second operand 1
Host variable second operand 1 1 1 second operandStart of change, 2End of change 1
Derived value based on a column 1 first operand first operand first operandStart of change, 2End of change 1 first operand
Derived value not based on a column second operand 1 1 1 second operand 1
Note:  
  1. If the CCSID sets are different, both operands are converted, if necessary, to Unicode. SBCS and Mixed are converted to UTF-8. DBCS is converted to UTF-16. See the next table to determine which operand supplies the CCSID for character conversion.
  2. Start of change If the encoding scheme that is in effect for the statement is EBCDIC or ASCII, the host variable is Unicode graphic, and the value of the field MIXED DATA on installation panel DSNTIPF is NO, the column or the derived value that is based on a column supplies the target CCSID set.End of change
The following table shows which operand is selected for conversion when both operands are based on a column or are not based on a column as represented in the previous table.
Table 3. Operand that supplies the CCSID for character conversion when both operands are based or not based on a column
First operand Second operand
SBCS data Mixed data DBCS data
SBCS data   second operand1 second operand
Mixed data first operand1   second operand
DBCS data first operand first operand  
Note:  
  1. For ASCII and EBCDIC data, the conversion depends on the value of the field MIXED DATA on installation panel DSNTIPF at the DB2® that does the comparison. If MIXED DATA = YES, the SBCS operand is converted to MIXED. If MIXED DATA = NO, the MIXED operand is converted to SBCS
For example, assume a comparison of the form:
   string-constant-SBCS =derived-value-not-based-on-column-DBCS
Assume that the operands have different encoding schemes. First look at Table 2. The relevant table entry is in the third row and second column. The value for this entry shows that if the CCSID sets are different, the operands are converted to Unicode. The first operand (string-constant-SBCS) is converted to UTF-8 (Mixed) if it is not already Unicode. In addition, the second operand (derived-value-not-based-on-column-DBCS) is converted to UTF-16 (Unicode DBCS) if necessary. After the operands have been converted to Unicode, Table 3 is used to determine which operand supplies the specific CCSID for the conversion. The relevant table entry is in the second row and third column. It indicates that the second operand (derived-value-not-based-on-column-DBCS) determines the CCSID because DBCS data takes precedence over Mixed data.

An error occurs if a character of the string cannot be converted, the SYSSTRINGS table is used but contains no information about the pair of CCSIDs of the operands being compared, or DB2 cannot do the conversion through z/OS® support for Unicode. A warning occurs if a character of the string is converted to a substitution character.

A derived value based on a column is an expression that includes columns that affects the result CCSID of the expression. For example, in the expression COL1||'abc', COL1 determines the result CCSID. Therefore, the expression COL1||'abc' is considered to be a derived value based on a column that continues to give the column precedence in any further comparisons. The expression CASE WHEN COL1 > 1 THEN 'abc' ELSE 'def' END contains a column that does not affect the result CCSID of the expression and is therefore not considered to be a derived value based on a column.

The following table defines which expressions are considered to be a derived value based on a column.
Table 4. Derived values based on a column
Expression Condition
expression1 || expression2 expression1 or expression2 is a column or a derived value based on a column
CASE when-clause THEN result-expression ELSE result-expression END any result-expression is a string-expression that is a column or derived value based on a column
CAST(expression as data-type) expression is a string-expression that is a column or a derived value based on a column and data-type is a string data type
Scalar-fullselect: (SELECT expression FROM table) expression is a string-expression that is a column or a derived value based on a column and data-type is a string data type
When a statement contains multiple CCSID sets, if the length of one of the strings changes after CCSID conversion, the string becomes a varying-length string. That is, the data type becomes VARCHAR, CLOB, VARGRAPHIC, or DBCLOB. The following table shows the worse case resulting lengths of CCSID conversion, where X is length in bytes.
Table 5. Worst case result length of CCSID conversion, where X represents LENGTH(string in bytes)
From CCSID To CCSID
EBCDIC ASCII Unicode
SBCS Mixed DBCS SBCS Mixed DBCS SBCS UTF-8 UTF-16
EBCDIC SBCS X X X*21 X X X*21 X1 X*3 X*2
Mixed X X X*21 X X X*21 X1 X*3 X*2
DBCS X*0.51 X+2 X X*0.51 X X X*0.5 X*1.5 X
ASCII SBCS X X X*21 X X X*21 X1 X*3 X*2
Mixed X X*1.8 X*21 X X X*21 X1 X*3 X*2
DBCS X*0.51 X+2 X X*0.51 X X X*0.5 X*1.5 X
Unicode SBCS X X X*2 X X X*2 X X X*2
UTF-8 X X*1.25 X X X X X X X*2
UTF-16 X*0.5 X+2 X X*0.5 X X X*0.5 X*1.5 X
Note:
  1. Because of the high probability of data loss, IBM® does not provide conversion tables for this combination of two CCSIDs and data subtypes.