Unicode in DB2 for iCLI
DB2® for i CLI provides several ways for applications to take advantage of Unicode in their applications.
This support is available for two different Unicode encodings, UTF-8 and UTF-16. Additional support exists for specifying a UCS-2 encoded character string only when preparing an SQL statement.
UTF-16 encoding support
Support for UTF-16 encoded character data is provided through a set of API's called the "Wide" API's. These API's accept as input and return as output UTF-16 data. This allows applications to run with a Unicode coded character set identifier (CCSID) of 1200, instead of being dependent upon the default CCSID of the job running the DB2 for i CLI work. In most cases the default CCSID of the job is an EBCDIC CCSID. Since the UTF-16 encoded character set is a superset of the UCS-2 encoded character set (CCSID 13488), applications can encode their character data in UCS-2 as well. CLI API functions have suffixes to indicate the format of their string arguments: those that accept Unicode end in W, and those that accept EBCDIC have no suffix. The following is a list of functions that are available in DB2 for i CLI which have both EBCDIC and Unicode versions:
| SQLColAttributeW | SQLColAttributesW | SQLColumnPrivilegesW |
| SQLColumnsW | SQLConnectW | SQLDataSourcesW |
| SQLDescribeColW | SQLDriverConnectW | SQLErrorW |
| SQLExecDirectW | SQLForeignKeysW | SQLGetConnectAttrW |
| SQLGetConnectOptionW | SQLGetCursorNameW | SQLGetDescFieldW |
| SQLGetDescRecW | SQLGetDiagFieldW | SQLGetDiagRecW |
| SQLGetInfoW | SQLGetPositionW | SQLGetStmtAttrW |
| SQLGetStmtOptionW | SQLGetSubStringW | SQLGetTypeInfoW |
| SQLNativeSQLW | SQLPrepareW | SQLPrimaryKeysW |
| SQLProcedureColumnsW | SQLProceduresW | SQLSetConnectAttrW |
| SQLSetConnectOptionW | SQLSetCursorNameW | SQLSetDescFieldW |
| SQLSetStmtAttrW | SQLSetStmtOptionW | SQLSpecialColumnsW |
| SQLStatisticsW | SQLTablePrivilegesW | SQLTablesW |
The syntax for a DB2 for
i CLI Wide function
is the same as the syntax for its corresponding EBCDIC function, except
that SQLCHAR parameters are defined as SQLWCHAR. Character buffers
defined as SQLPOINTER in the EBCDIC syntax can be defined as either
SQLCHAR or SQLWCHAR in the Unicode function. Refer to the EBCDIC version
of the CLI Unicode functions for EBCDIC syntax details.
The SQL type's SQL_WCHAR and SQL_WVARCHAR can be used to specify a buffer that contains Unicode data. So, to specify a particular column or parameter marker containing Unicode data the application can bind as SQL_WCHAR for fixed length character data or bind as SQL_WVARCHAR for varying length character data. Since UTF-16 data is double byte character data the input and output lengths must take this into account. Unicode functions that have arguments which are always character strings interpret these arguments as the number of double byte characters. When the length might refer to string or non-string data, the length will be interpreted as the number of bytes needed to store the data. For example, the SQLGetInfoW()SQLGetInfoW() API accepts the input length as the number of bytes, while SQLPrepareW() accepts the number of double byte character's.
DB2 for
i CLI allows for
the mixing of the Wide character API's and non-Wide character API's.
Applications must take into account that Unicode data can only be
specified for the Wide API calls, and not the non-Wide API calls.
Most applications will probably want to commit to either running with
Unicode encoding or will choose to run with a non-Unicode character
encoding since most data will be in a consistent encoding. However,
support does exist for mixing Unicode and non-Unicode calls in the
same CLI environment. DB2 for
i CLI
does restrict the mixing of Wide character API's and an environment
with UTF-8 support enabled. Enabling UTF-8 support is discussed in
the next section.
UTF-8 encoding support
Support
for UTF-8 encoded character data is provided through the setting of
an environment or connection attribute, SQL_ATTR_UTF8. Setting the
attribute to SQL_TRUE will indicate that all input and output data
is to be treated as Unicode character data. This support allows applications
to run with a Unicode coded character set identifier (CCSID) of 1208,
instead of being dependent upon the default CCSID of the job running
theDB2 for
i CLI
work. The UTF-8 support does not require any new data type bindings
by the application. When binding, applications can continue to use
SQL_CHAR for fixed length character data and SQL_VARCHAR can be used
for varying length character data. When an application binds as any
character SQL type, DB2 for
i CLI
will take care of tagging the data with the UTF-8 CCSID, so DB2 for
i will translate
the data properly. UTF-8 data is handled on every DB2 for
i CLI API that takes
character data as input and returns character data as output. Each
of the API's which has a matching wide character version also supports
UTF-8 character data. See the list of API's in the previous section
to identify which functions support both UTF-16 and UTF-8 Unicode
character data. Functions that accept both a UTF-8 string and a length
expect the length to be in bytes, not in characters. This is in contrast
to the Wide API's which expect the length to be in the number of double
byte characters in most cases. As was discussed in the previous section,
mixing a UTF-8 environment with calls to the Wide character API's
is restricted. Additionally, unlike the Wide character API's, which
allow alternating calls between Unicode and non-Unicode supported
API's, once the UTF-8 environment is setup, all input and output character
data is expected to be in the UTF-8 encoding by DB2 for
i CLI.
UCS-2 encoding support
DB2 for
i CLI provides some
specific support for UCS-2 encoded character strings. This support
was added before the Wide API support, and therefore is not a complete
solution for applications wanting to enable full Unicode support in DB2 for
i CLI. Since the
UTF-16 encoded character set is a superset of the UCS-2 character
set, applications can get full UCS-2 support through the use of the
Wide API's discussed earlier in the "Unicode in DB2 for
i CLI" section. To
enable this limited UCS-2 support, set the connection attribute SQL_ATTR_UCS2
to SQL_TRUE. This will tell DB2 for
i CLI
to treat input
strings as UCS-2 character data at prepare time. SQL statements can
be prepared using either the SQLPrepare() or SQLExecDirect() API's.
This support does not allow for UCS-2 character strings on input or
output for any other DB2 for
i CLI
API's.