Character conversion

A string is a sequence of bytes that might represent characters. All the characters within a string have a common coding representation. In some cases, it might be necessary to convert these characters to a different coding representation, a process known as character conversion.

When character conversion is required, it is automatic. Applications do not need to explicitly invoke character conversion, because the Db2® database server and client perform all necessary character conversion automatically.

Character conversion can occur when an SQL statement is executed remotely. Consider, for example, the following scenarios in which the coding representations might be different at the sending and receiving systems:
  • The values of host variables are sent from the application requester to the application server.
  • The values of result columns are sent from the application server to the application requester.

Following is a list of terms used when discussing character conversion:

character set
A defined set of characters. For example, the following character set appears in several code pages:
  • 26 non-accented letters A through Z
  • 26 non-accented letters a through z
  • digits 0 through 9
  • . , : ; ? ( ) ' " / - _ & + % * = < >
code page
A set of assignments of characters to code points. In the ASCII encoding scheme for code page 850, for example, "A" is assigned code point X'41', and "B" is assigned code point X'42'. Within a code page, each code point has only one specific meaning. A code page is an attribute of the database. When an application program connects to the database, the database manager determines the code page of the application.
code point
A unique bit pattern that represents a character.
encoding scheme
A set of rules used to represent character data, for example:
  • Single-Byte ASCII
  • Single-Byte EBCDIC
  • Double-Byte ASCII
  • Mixed single- and double-byte ASCII

The following figure shows how a typical character set might map to different code points in two different code pages. Even with the same encoding scheme, there are many different code pages, and the same code point can represent a different character in different code pages. Furthermore, a byte in a character string does not necessarily represent a character from a single-byte character set (SBCS). Character strings are also used for mixed and bit data. Mixed data is a mixture of single-byte, double-byte, or multibyte characters. Bit data (columns defined as FOR BIT DATA, or BLOBs, or binary strings) is not associated with any character set.

Figure 1. Mapping a Character Set in Different Code Pages
Mapping a Character Set in Different Code Pages

The database manager determines code page attributes for all character strings when an application is bound to a database. The possible code page attributes are:

Database code page
The database code page is stored in the database configuration file. The value is specified when the database is created and cannot be altered.
Application code page
The code page under which the application runs. This code page is not necessarily the same code page under which the application was bound.
Section code page
The code page under which the SQL statement runs. Typically, the section code page is the database code page. However, the Unicode code page (UTF-8) is used as the section code page if:
  • The statement references a table that is created with the Unicode encoding scheme in a non-Unicode database
  • The statement references a table function that is defined with PARAMETER CCSID UNICODE in a non-Unicode database
Code Page 0
This value represents a string that is derived from an expression that contains a FOR BIT DATA value, a binary data type value, or a BLOB value.

Character string code pages have the following attributes:

  • Columns can be in the database code page, the Unicode code page (UTF-8), or code page 0 (if defined as FOR BIT DATA, binary, or BLOB).
  • Constants and special registers (for example, USER, CURRENT SERVER) are in the section code page. Constants are converted, if necessary, from the application code page to the database code page, and then to the section code page when an SQL statement is bound to the database.
  • Input host variables are in the application code page. As of Version 8, string data in input host variables is converted, if necessary, from the application code page to the section code page before being used. The exception occurs when a host variable is used in a context where it is to be interpreted as bit data; for example, when the host variable is to be assigned to a column that is defined as FOR BIT DATA.

A set of rules is used to determine code page attributes for operations that combine string objects, such as scalar operations, set operations, or concatenation. Code page attributes are used to determine requirements for code page conversion of strings at run time.