Unicode in DB2 for iCLI

DB2® for i CLI provides several ways for applications to take advantage of Unicode in their applications.

This support is available for two different Unicode encodings, UTF-8 and UTF-16. Additional support exists for specifying a UCS-2 encoded character string only when preparing an SQL statement.

UTF-16 encoding support

Support for UTF-16 encoded character data is provided through a set of API's called the "Wide" API's. These API's accept as input and return as output UTF-16 data. This allows applications to run with a Unicode coded character set identifier (CCSID) of 1200, instead of being dependent upon the default CCSID of the job running the DB2 for i CLI work. In most cases the default CCSID of the job is an EBCDIC CCSID. Since the UTF-16 encoded character set is a superset of the UCS-2 encoded character set (CCSID 13488), applications can encode their character data in UCS-2 as well. CLI API functions have suffixes to indicate the format of their string arguments: those that accept Unicode end in W, and those that accept EBCDIC have no suffix. The following is a list of functions that are available in DB2 for i CLI which have both EBCDIC and Unicode versions:

Table 1. List of functions with both EBCIDIC and Unicode versions
SQLColAttributeW	SQLColAttributesW	SQLColumnPrivilegesW
SQLColumnsW	SQLConnectW	SQLDataSourcesW
SQLDescribeColW	SQLDriverConnectW	SQLErrorW
SQLExecDirectW	SQLForeignKeysW	SQLGetConnectAttrW
SQLGetConnectOptionW	SQLGetCursorNameW	SQLGetDescFieldW
SQLGetDescRecW	SQLGetDiagFieldW	SQLGetDiagRecW
SQLGetInfoW	SQLGetPositionW	SQLGetStmtAttrW
SQLGetStmtOptionW	SQLGetSubStringW	SQLGetTypeInfoW
SQLNativeSQLW	SQLPrepareW	SQLPrimaryKeysW
SQLProcedureColumnsW	SQLProceduresW	SQLSetConnectAttrW
SQLSetConnectOptionW	SQLSetCursorNameW	SQLSetDescFieldW
SQLSetStmtAttrW	SQLSetStmtOptionW	SQLSpecialColumnsW
SQLStatisticsW	SQLTablePrivilegesW	SQLTablesW

Start of change The syntax for a DB2 for i CLI Wide function is the same as the syntax for its corresponding EBCDIC function, except that SQLCHAR parameters are defined as SQLWCHAR. Character buffers defined as SQLPOINTER in the EBCDIC syntax can be defined as either SQLCHAR or SQLWCHAR in the Unicode function. Refer to the EBCDIC version of the CLI Unicode functions for EBCDIC syntax details. End of change

The SQL type's SQL_WCHAR and SQL_WVARCHAR can be used to specify a buffer that contains Unicode data. So, to specify a particular column or parameter marker containing Unicode data the application can bind as SQL_WCHAR for fixed length character data or bind as SQL_WVARCHAR for varying length character data. Since UTF-16 data is double byte character data the input and output lengths must take this into account. Unicode functions that have arguments which are always character strings interpret these arguments as the number of double byte characters. When the length might refer to string or non-string data, the length will be interpreted as the number of bytes needed to store the data. For example, the SQLGetInfoW()SQLGetInfoW() API accepts the input length as the number of bytes, while SQLPrepareW() accepts the number of double byte character's.

Start of change DB2 for i CLI allows for the mixing of the Wide character API's and non-Wide character API's. Applications must take into account that Unicode data can only be specified for the Wide API calls, and not the non-Wide API calls. Most applications will probably want to commit to either running with Unicode encoding or will choose to run with a non-Unicode character encoding since most data will be in a consistent encoding. However, support does exist for mixing Unicode and non-Unicode calls in the same CLI environment. DB2 for i CLI does restrict the mixing of Wide character API's and an environment with UTF-8 support enabled. Enabling UTF-8 support is discussed in the next section. End of change

UTF-8 encoding support

Start of change Support for UTF-8 encoded character data is provided through the setting of an environment or connection attribute, SQL_ATTR_UTF8. Setting the attribute to SQL_TRUE will indicate that all input and output data is to be treated as Unicode character data. This support allows applications to run with a Unicode coded character set identifier (CCSID) of 1208, instead of being dependent upon the default CCSID of the job running theDB2 for i CLI work. The UTF-8 support does not require any new data type bindings by the application. When binding, applications can continue to use SQL_CHAR for fixed length character data and SQL_VARCHAR can be used for varying length character data. When an application binds as any character SQL type, DB2 for i CLI will take care of tagging the data with the UTF-8 CCSID, so DB2 for i will translate the data properly. UTF-8 data is handled on every DB2 for i CLI API that takes character data as input and returns character data as output. Each of the API's which has a matching wide character version also supports UTF-8 character data. See the list of API's in the previous section to identify which functions support both UTF-16 and UTF-8 Unicode character data. Functions that accept both a UTF-8 string and a length expect the length to be in bytes, not in characters. This is in contrast to the Wide API's which expect the length to be in the number of double byte characters in most cases. As was discussed in the previous section, mixing a UTF-8 environment with calls to the Wide character API's is restricted. Additionally, unlike the Wide character API's, which allow alternating calls between Unicode and non-Unicode supported API's, once the UTF-8 environment is setup, all input and output character data is expected to be in the UTF-8 encoding by DB2 for i CLI. End of change

UCS-2 encoding support

Start of change DB2 for i CLI provides some specific support for UCS-2 encoded character strings. This support was added before the Wide API support, and therefore is not a complete solution for applications wanting to enable full Unicode support in DB2 for i CLI. Since the UTF-16 encoded character set is a superset of the UCS-2 character set, applications can get full UCS-2 support through the use of the Wide API's discussed earlier in the "Unicode in DB2 for i CLI" section. To enable this limited UCS-2 support, set the connection attribute SQL_ATTR_UCS2 to SQL_TRUE. This will tell DB2 for i CLI to treat input strings as UCS-2 character data at prepare time. SQL statements can be prepared using either the SQLPrepare() or SQLExecDirect() API's. This support does not allow for UCS-2 character strings on input or output for any other DB2 for i CLI API's. End of change