Generating escaped Unicode data

If you pass Unicode characters to an application or object that is not intended to handle Unicode data, data might be lost unless you escape certain characters. For example, you might need to pass Unicode data through an application that has EBCDIC host variables. Or you might want to store Unicode data in a non-Unicode table.

About this task

You might also want to select Unicode characters from an application that runs on a 3270 terminal emulator, such as SPUFI. If the CCSID setting of the emulator does not include those Unicode characters, those characters do not display properly in the output.

In these situations, those Unicode characters that cannot be represented in the encoding scheme of the application or object are lost unless you escape them. Escaped data is one or more characters that cannot be represented in the target CCSID and is instead represented by the encoding value. This representation preserves the data. For example, the escaped version of the Unicode character д is \0434. Thus, the following ASCII string contains the escaped character д: 'The escaped character is \0434'

If you insert escaped data into a Unicode table, Db2 does not interpret your data and modify it to be un-escaped. Escaped data is stored as is in a Db2 table, regardless of whether the table is an ASCII, EBCDIC, or Unicode table.

Procedure

To generate escaped Unicode data:

  1. Use the ASCII_STR function or the EBCDIC_STR function.

    These functions convert a Unicode string to an ASCII or EBCDIC string. Characters that do not exist in ASCII or EBCDIC are converted to the form \xxxx, where xxxx represents a UTF-16 code unit.

    For more information about how to convert characters to UTF-16 format, see step 2 under the instructions for the INSERT statement in Inserting data into a Unicode table.

  2. If you later need to convert the EBCDIC or ASCII string with escaped data back to Unicode, use the UNICODE_STR function.

    The short form of the function name is UNISTR. This function interprets escaped data in the source string. Values that are preceded by a backslash ('\') are treated as Unicode UTF-16 characters. For example '\0041' is the Unicode UTF-16 representation for 'A'.

Examples

Begin general-use programming interface information.
Example: Escaping data
Assume that T1.C1 contains 'Hi, my name is Андрей '. Notice that the characters in Андрей are all Cyrillic characters, even though some of them do resemble Latin characters. Suppose that you issue the following query in SPUFI:
SELECT C1 FROM T1;
The result of this query is displayed as follows on a 3270 terminal emulator with the CCSID set to 37:
‘Hi, my name is ......'
Because the characters in Андрей do not exist in CCSID 37, this name is instead displayed as ....... To solve this problem, you can add the EBCDIC_STR function, as shown in the following example:
SELECT EBCDIC_STR(C1)FROM T1;
Db2 returns the following output with escaped data:
‘Hi, my name is \0410\043D\0434\0440\0435\0439'
Notice that 0410 is the UTF-16 value for А, 043D is the UTF-16 value for н and so on.
Example: Un-escaping data
Assume that T1.C1 contains 'Андрей '. Suppose that you issue the following query:
SELECT HEX(UNISTR(ASCII_STR(C1))) FROM T1;
Db2 interprets this query as follows:
Table 1. How Db2 interprets query with UNISTR
Part of SELECT statements Result Explanation
ASCII_STR(C1) \0410\043D\0434\0440\0435\0439 Db2 returns the value in C1 (Андрей ) as an ASCII string. Because these characters cannot be represented in ASCII, they are escaped.
UNISTR(ASCII_STR(C1)) Андрей Db2 then converts the escaped ASCII string to a Unicode UTF-8 string. UTF-8 includes all of the characters, so they no longer have to be escaped.
HEX(UNISTR(ASCII_STR(C1))) D090D0BDD0B4D180D0B5D0B9 Db2 then returns the hexadecimal value of the UTF-8 string.
Thus, the final result of this query is:
D090D0BDD0B4D180D0B5D0B9
Suppose that you issue the following similar query:
SELECT HEX(UNISTR(ASCII_STR(C1),UTF16)) FROM T1;
Db2 interprets this query as follows:
Table 2. How Db2 interprets query with UNISTR and UTF16 parameter
Part of SELECT statements Result Explanation
ASCII_STR(C1) \0410\043D\0434\0440\0435\0439 Db2 returns the value in C1 (Андрей ) as an ASCII string. Because these characters cannot be represented in ASCII, they are escaped.
UNISTR(ASCII_STR(C1),UTF16) Андрей Db2 then converts the escaped ASCII string to a Unicode UTF-16 string. UTF_16 includes all of the characters, so they no longer have to be escaped.
HEX(UNISTR(ASCII_STR(C1))) D090D0BDD0B4D180D0B5D0B9 Db2 then returns the hexadecimal value of the UTF-16 string.
Thus, the final result of this query is:
0410043D0434044004350439
End general-use programming interface information.