Inserting data into a Unicode table

Unicode tables can store any characters. For characters that you can type on your keyboard, INSERT statements are straightforward. But suppose that you want to insert a character that is not on your keyboard, such as the yen sign (¥) on the U.S. keyboard. That process requires some extra steps.

Procedure

To insert data into a Unicode table, use one of the following methods:

  • Load the data from a data set by using the LOAD utility. If the input data set is already in Unicode, specify the UNICODE option. If the data is not in Unicode, ensure that you specify the appropriate encoding scheme keyword (ASCII, EBCDIC, or CCSID) in the LOAD utility statement. The default is EBCDIC. Db2 converts ASCII and EBCDIC data to Unicode when it is loaded into a Unicode table. Be aware that this conversion might cause the data to expand.
  • Load the data from an another table by using the cross-loader function. If the data is from an EBCDIC or ASCII table, Db2 converts it to Unicode when it is loaded into the target Unicode table. Be aware that this conversion might cause the data to expand.
  • Insert individual rows by using the INSERT statement. For characters that cannot be typed on your keyboard, use the Unicode constant UX'xxxx'.
    This constant is always in UTF-16, which means that you need to specify the value in UTF-16 format. To determine the Unicode constant for a particular character perform the following steps:
    1. Look up the Unicode code point. Use the Unicode character code charts on the Unicode Consortium web site. For example, the yen sign (¥) is U+00A5.
    2. Convert the Unicode code point to UTF-16 format by performing one of the following actions:
      • If the Unicode code point U+yyyy is less than U+FFFF, encoding it in UTF-16 is simple. Just copy the value. For example, the following Unicode code points can be specified as the following Unicode constants:
        Table 1. Unicode code points and their corresponding Unicode constants for Unicode code points that are less than U+FFFF
        Character Unicode code point UTF-16 format Unicode constant
        ¥ U+00A5 X'00A5' UX'00A5'
        ĸ U+0138 X'0138' UX'0138'
        Begin figure description. Unicode character U+270E End figure description. U+270E X'270E' UX'270E'
      • If the Unicode code point U+yyyy is greater than or equal to U+FFFF, encode that character as UTF-16 format, and use that encoded value. For example, Unicode code point U+200D0 can be encoded in UTF-16 as X'D840DCD0'. Thus, the Unicode constant is UX'D840DCD0'.

        You can find the steps for how to manually encode and decode Unicode data on the Unicode Consortium web site. Alternatively, you can use a converter tool to do the conversion for you.

Example

The following INSERT statement inserts a row with Unicode character U+200D0, which is Begin figure description. Unicode character 200D0. End figure description., in the second column.
INSERT INTO UNITAB VALUES ('7A907',UX'D840DCD0','A');