Unicode considerations for data movement

The export, import, and load utilities are not supported when they are used with a Unicode client connected to a non-Unicode database.

The DEL, ASC, and PC/IXF file formats are supported for a Unicode database, as described in this section.

When exporting from a Unicode database to an ASCII delimited (DEL) file, all character data is converted to the application code page. Both character string and graphic string data are converted to the same SBCS or MBCS code page of the client. This is expected behavior for the export of any database, and cannot be changed, because the entire delimited ASCII file can have only one code page. Therefore, if you export to a delimited ASCII file, only those UCS-2 characters that exist in your application code page will be saved. Other characters are replaced with the default substitution character for the application code page. For UTF-8 clients (code page 1208), there is no data loss, because all UCS-2 characters are supported by UTF-8 clients.

When importing from an ASCII file (DEL or ASC) to a Unicode database, character string data is converted from the application code page to UTF-8, and graphic string data is converted from the application code page to UCS-2. There is no data loss. If you want to import ASCII data that has been saved under a different code page, you should change the data file code page before issuing the IMPORT command. You can specify the code page of the data file by setting the DB2CODEPAGE registry variable to the code page of the ASCII data file or by using the codepage file type modifier.

The range of valid ASCII delimiters for SBCS and MBCS clients is identical to what is currently supported by Db2® for those clients. The range of valid delimiters for UTF-8 clients is X'01' to X'7F', with the usual restrictions.

When exporting from a Unicode database to a PC/IXF file, character string data is converted to the SBCS/MBCS code page of the client. Graphic string data is not converted, and is stored in UCS-2 (code page 1200). There is no data loss.

When importing from a PC/IXF file to a Unicode database, character string data is assumed to be in the SBCS/MBCS code page stored in the PC/IXF header, and graphic string data is assumed to be in the DBCS code page stored in the PC/IXF header. Character string data is converted by the import utility from the code page specified in the PC/IXF header to the code page of the client, and then from the client code page to UTF-8 (by the INSERT statement). Graphic string data is converted by the import utility from the DBCS code page specified in the PC/IXF header directly to UCS-2 (code page 1200).

The load utility places the data directly into the database and, by default, assumes data in ASC or DEL files to be in the code page of the database. Therefore, by default, no code page conversion takes place for ASCII files. When the code page for the data file has been explicitly specified (using the codepage file type modifier), the load utility uses this information to convert from the specified code page to the database code page before loading the data. For PC/IXF files, the load utility always converts from the code pages specified in the IXF header to the database code page (1208 for CHAR, and 1200 for GRAPHIC).

The code page for DBCLOB files is always 1200 for UCS-2. The code page for CLOB files is the same as the code page for the data files being imported, loaded or exported. For example, when loading or importing data using the PC/IXF format, the CLOB file is assumed to be in the code page specified by the PC/IXF header. If the DBCLOB file is in ASC or DEL format, the load utility assumes that CLOB data is in the code page of the database, while the import utility assumes it to be in the code page of the client application.

The nochecklengths modifier is always specified for a Unicode database, because:

Any SBCS can be connected to a database for which there is no DBCS code page
Character strings in UTF-8 format usually have different lengths than those in client code pages.

Considerations for code page 1394, 1392, and 5488

The import, export and load utilities can be used to transfer data from the Chinese code page GB18030 (code page identifier 1392 and 5488) and the Japanese code page ShiftJISX 0213 (code page identifier 1394) to Db2 Unicode databases. In addition, the export utility can be used to transfer data from Db2 Unicode databases to GB18030 or ShiftJIS X0213 code page data.

For example, the following command will load the Shift JIS X0213 data file u/jp/user/x0213/data.del residing on a remotely connected client into MYTABLE:


   db2 load client from /u/jp/user/x0213/data.del 
   of del modified by codepage=1394 insert into mytable

where MYTABLE is located on a Db2 Unicode database.

Since only connections between a Unicode client and a Unicode server are supported, you need to use either a Unicode client or set the Db2 registry variable DB2CODEPAGE to 1208 before using the load, import, or export utilities.

Conversion from code page 1394 to Unicode can result in expansion. For example, a 2-byte character can be stored as two 16-bit Unicode characters in the GRAPHIC columns. You need to ensure the target columns in the Unicode database are wide enough to contain any expanded Unicode byte.

Incompatibilities

For applications connected to a Unicode database, graphic string data is always in UCS-2 (code page 1200). For applications connected to non-Unicode databases, the graphic string data is in the DBCS code page of the application, or not allowed if the application code page is SBCS. For example, when a 932 client is connected to a Japanese non-Unicode database, the graphic string data is in code page 301. For the 932 client applications connected to a Unicode database, the graphic string data is in UCS-2 encoding.