Determining the encoding scheme and CCSID of a string

An encoding scheme and a CCSID are attributes of strings, just as length is an attribute of strings. All values of the same string column have the same encoding scheme and CCSID.

Every string has an encoding scheme and a CCSID that identifies the manner in which the characters in the string are encoded. Strings can be encoded in ASCII, EBCDIC, or Unicode.

The CCSID that is associated with a string value depends on the SQL statement in which the data is referenced and the type of expression. Use the Type 1 rules when the SQL statement meets the following conditions:

  • The SQL statement operates with a single set of CCSIDs (SBCS, mixed, and graphic). An SQL statement that does not contain any of the following items operates with a single set of CCSIDs:
    • References to columns from multiple tables or views that are defined with CCSIDs from more than one set of CCSIDs (SBCS, mixed, and graphic)
    • References to an EBCDIC table that contains a Unicode column
    • Graphic hexadecimal (GX) or hexadecimal Unicode (UX) string constants
    • References to the XMLCLOB built-in function
    • Cast specifications with a CCSID clause
    • User-defined table functions
  • The SQL statement is not one of the following statements:
    • CALL statement
    • SET assignment statement
    • SET special register
    • VALUES statement
    • VALUES INTO statement
  • One of the following built-in functions is not referenced:
    • ASCII_CHR
    • ASCII_STR (or ASCIISTR)
    • Start of changeCHREnd of change
    • DECRYPT_BIT
    • DECRYPT_CHAR
    • Start of changeDECRYPT_DATAKEY_BITEnd of change
    • Start of changeDECRYPT_DATAKEY_CLOBEnd of change
    • Start of changeDECRYPT_DATAKEY_DBCLOBEnd of change
    • Start of changeDECRYPT_DATAKEY_VARCHAREnd of change
    • Start of changeDECRYPT_DATAKEY_VARGRAPHICEnd of change
    • DECRYPT_DB
    • EBCDIC_CHR
    • EBCDIC_STR
    • GETVARIABLE
    • Start of changeJSON_VALEnd of change
    • NORMALIZE_STRING
    • UNICODE_STR (or UNISTR)
    • XMLSERIALIZE
  • The SQL statement does not include a collection-derived table (UNNEST).

Use the Type 2 rules when the statement does not meet the conditions for Type 1 rules.

Start of changeFor those SQL statements and tools that use a SYSDUMMYx table, use the SYSDUMMYx table that has the same encoding scheme as the other objects in your SQL statement, to avoid conversions. A SYSDUMMYx table is available in each encoding scheme. For example, suppose that your SQL statement references a SYSDUMMYx table that is in a different encoding scheme from other objects in the statement. Db2 treats this statement as one that references objects with different CCSIDs (Type 2 rules), and conversion is likely to occur. To avoid this situation, reference the SYSDUMMYx table that has the same encoding scheme as the other objects in your SQL statement. See SYSDUMMYx tables for more information.End of change

Table 1 describes the rules for determining the CCSID of derived string data when the source data has a string type.
Table 1. Rules for determining the CCSID that is associated with string data derived from string data
Source of the string data Type 1 rules Type 2 rules
String constant If the statement references a table or view, the encoding scheme of that table or view determines the encoding scheme for the string constant.

The CCSID is the appropriate character string CCSID of the encoding scheme.

The CCSID is the appropriate character string CCSID of the application encoding scheme.1
Datetime constant If the statement references a table or view, the encoding scheme of that table or view determines the encoding scheme for the string constant.

Otherwise, the default EBCDIC encoding scheme is used for the string constant.

The CCSID is the appropriate character string CCSID of the encoding scheme.

The CCSID is the appropriate character string CCSID of the application encoding scheme.1
Hexadecimal string constant (X'...') If the statement references a table or view, the encoding scheme of that table or view determines the encoding scheme for the string constant.

Otherwise, the default EBCDIC encoding scheme is used for the string constant.

The CCSID is the appropriate graphic string CCSID of the encoding scheme.

The CCSID is the appropriate character string CCSID of the application encoding scheme.1
Graphic string
constant
(G'...')
If the statement references a table or view, the encoding scheme of that table or view determines the encoding scheme for the graphic string constant.

Otherwise, the default EBCDIC encoding scheme is used for the graphic string constant.

The CCSID is the graphic string CCSID of the encoding scheme.

The CCSID is the graphic string CCSID of the application encoding scheme.1
Graphic hexadecimal constant (GX'...') Not applicable. The CCSID is the graphic string CCSID of the application encoding scheme, which must be ASCII or EBCDIC.
Hexadecimal Unicode string constant (UX'....') Not applicable. The CCSID is 1200 (UTF-16).
Special register If the statement references a table or view, the encoding scheme of that table or view determines the encoding scheme for the special register.

Otherwise, the default EBCDIC encoding scheme is used for the special register.

The CCSID is the appropriate character string CCSID of the encoding scheme.

The CCSID is the appropriate CCSID of the application encoding scheme.1
Column of a table The CCSID is the CCSID that is associated with the column of the table. The CCSID is the CCSID that is associated with the column of the table.
Column of a view The CCSID is the CCSID of the column of the result table of the fullselect of the view definition. The CCSID is the CCSID of the column of the result table of the fullselect of the view definition.
Expression The CCSID is the CCSID of the result of the expression. The CCSID is the CCSID of the result of the expression.
Result of a built-in function If the description of the function, in Built-in functions, indicates what the CCSID of the result is, the CCSID is that CCSID.

Otherwise, if the description of the function refers to this table for the CCSID, the CCSID is the appropriate CCSID of the CCSID set that is used by the statement for the data type of the result.

If the description of the function, in Built-in functions, indicates what the CCSID of the result is, the CCSID is that CCSID.

Otherwise, if the description of the function refers to this table for the CCSID, the CCSID is the appropriate CCSID of the application encoding scheme for the data type of the result.1

Parameter of a user-defined routine The CCSID is the CCSID that was determined when the function or procedure was created. The CCSID is the CCSID that was determined when the function or procedure was created.
The expression in the RETURN statement of a CREATE statement for a user-defined SQL scalar function If the expression in the RETURN statement is string data, the encoding scheme is the same as for the parameters of the function. The CCSID is determined from the encoding scheme and the attributes of the data. The CCSID is determined from the CCSID of the result of the expression specified in the RETURN statement.
String host variable If the statement references a table or view, the encoding scheme of that table or view determines the encoding scheme for the host variable.

Graphic variables are an exception if the table or view is EBCDIC or ASCII and the value of the MIXED DATA field on the DSNTIPF panel is NO. In this case, the Unicode encoding scheme is used for the graphic host variable.

Otherwise, the default EBCDIC encoding scheme is used for the host variable.

The CCSID is the appropriate CCSID of the data type of the host variable.

At package prepare time, the CCSID is the appropriate CCSID of the data type of the host variable for the application encoding scheme.

Graphic variables are an exception if the application or encoding scheme is EBCDIC or ASCII and the value of the MIXED DATA field on the DSNTIPF panel is NO. In this case, the Unicode encoding scheme is used for the graphic host variable.

At run time, the CCSID specified in the declare variable statement, or as an override in the SQLDA. Otherwise, the CCSID is the appropriate CCSID of the application encoding scheme for the data type of the host variable.

Notes:
  1. Start of changeIf the context is within a check constraint or a package for a basic trigger, the CCSID is the appropriate CCSID for Unicode, instead of the application encoding scheme. If the context is within a package for an advanced trigger, the CCSID is determined from the implicitly or explicitly specified APPLICATION ENCODING SCHEME option; otherwise the CCSID is the appropriate CCSID for Unicode instead of the application encoding scheme.End of change

Determining the encoding scheme and CCSID of a string shows the rules for determining the CCSID of derived string data when the source data has a numeric type.

Table 2. Rules for determining the CCSID that is associated with string data derived from numeric data
Source of the numeric data Type 1 rules Type 2 rules
Numeric data If the statement references a table or view, the encoding scheme of that table or view determines the encoding scheme for the string data that is derived from a numeric value.

Otherwise, the default EBCDIC encoding scheme is used for the string data derived from a numeric value.

The CCSID is the appropriate character or graphic string CCSID of the application encoding scheme.

The CCSID is the appropriate character or graphic string CCSID of the application encoding scheme. 1
Notes:
  1. Start of changeIf the context is within a check constraint or a package for a basic trigger, the CCSID is the appropriate CCSID for Unicode, instead of the application encoding scheme. If the context is within a package for an advanced trigger, the CCSID is determined from the implicitly or explicitly specified APPLICATION ENCODING SCHEME option; otherwise the CCSID is the appropriate CCSID for Unicode instead of the application encoding scheme.End of change

The following examples show how these rules are applied.

Example 1: Assume that the default encoding scheme for the installation is EBCDIC and that the installation does not support mixed and graphic data. The following statement conforms to the rules for Type 1 in Table 1. Therefore, the X'40' is interpreted as EBCDIC SBCS data because the statement references a table that is in EBCDIC. The CCSID for X'40' is the default EBCDIC SBCS CCSID for the installation.
SELECT * FROM EBCDIC_TABLE WHERE COL1 = X'40';
the result of the query includes each row that has a value in column COL1 that is equal to a single EBCDIC blank.
Example 2: The following statement references data from two different tables that use different encoding schemes. This statement does not conform to the rules for Type 1 statements in Table 1. Therefore, the rules for Type 2 statements are used. The CCSID for X'40' is dependent on the current application encoding scheme. Assuming that the current application encoding scheme is EBCDIC, X'40' represents a single EBCDIC blank.
SELECT * FROM EBCDIC_TABLE, UNICODE_TABLE WHERE COL1 = X'40';
as with Example 1, the result of the query includes each row that has a value in column COL1 that is equal to a single EBCDIC blank. If the current application encoding scheme were ASCII or Unicode, X'40' would represent something different and the results of the query would be different.