DB2® Universal DatabaseTM (UDB) is used worldwide and is capable of handling many national languages. If you're using DB2 UDB in a non-English environment, you must make sure you are set up correctly. Without the necessary environmental setup, you may not see your preferred national langage when you interface with DB2. In addition, if code page issues are not considered, characters within the database may be corrupted when you use certain utilities. This article describes how you can interface with DB2 UDB in a specific national language and how code page affects the data in a database.
DB2 UDB Command Line Processor
DB2 Command Line Processor (CLP) is the command line interface to execute database utilities, SQL statements, and online help. Except for the reserved DB2 commands and keywords, which are always in English, CLP interacts with you in the national language of your operating system command prompt.
DB2 Version 8 is available in the following national languages:
- Brazilian Portuguese
- Simplified Chinese
- Traditional Chinese
- Continental Portuguese
DB2 speaks to you in your language of choice if that language is available. Otherwise, the CLP will display in English. Let's take a look at what makes a particular national language available for DB2's use with the Command Line Processor.
On UNIX® platforms, CLP interacts with you in the language specified
by the environment variable LANG, if that language is available in DB2 and
installed. Otherwise English is used. For example, if LANG is set to
"Ja_JP" on AIX®, then CLP will retrieve the Shift JIS Japanese
messages from the
If you see garbage characters, you probably do not have the font or locale
installed correctly on your system. For information on installing font and
locale, see the article Setting Up a Mixed-Byte Character Set (MBCS) Database on an English
OS in DB2 UDB Version 8.
To check your current LANG setting, type
CLP can be invoked by typing
db2. The following
example shows how you would issue the
command, begin the CLP, and then issue a few commands in a Japanese
Windows and OS/2
On Windows® and OS/2®, only one national language version of DB2 Version 7 can be installed. CLP will use that language. However, on DB2 UDB Version 8, support for multiple national languages can be installed on Windows. (NOTE: DB2 UDB Version 8 does not support OS/2). Regardless of the national language version of DB2, CLP will use the language of Windows. The National Language Support Appendix in the DB2 UDB Administration Guide - Planning manual contains important additional information about DB2's national language support capabilities.
DB2 UDB EXPORT, IMPORT and LOAD utilities
The DB2 EXPORT utility allows users to write data from a DB2 database to one or more files stored outside of the database. The DB2 UDB IMPORT and LOAD utilities allow users to insert data from a file into a database table or updatable view.
In the following sections, we will discuss the code page related issue for the DB2 UDB EXPORT, IMPORT, and LOAD utilities. For further information about these utilities, consult the DB2 UDB Data Movement Utilities Guide and Reference.
DB2 allows you to create a database in one of the supported code sets in any supported system. For example, you can create an IBM-932, IBM-943, IBM-EUCJP/EUCJP or UTF-8 database on a Linux machine, although Linux may not support the code set IBM-932 or IBM-943. Officially Linus does not support IBM-932 or IBM-943, but since Linux is open source, the Linux user can add that support.
A character may be represented by different code points in different code pages. For example, the following table lists the code points for the same two Japanese characters "" in the various available code sets:
|Code set||Code points|
|Unicode UTF-8||X'E697A5 E69982'|
|Unicode UCS-2 (big-endian)||X'65E5 6642'|
Code page considerations when using IMPORT
The DB2 IMPORT utility, by default, assumes the data in the input file is encoded in the current system code page. For example on AIX, if the environment variable LANG is set to "Ja_JP.IBM943", then IMPORT will assume your input data file is encoded in code page 943. When you import a data file into a database, DB2 will automatically convert the data file from the current system code page to the database code page.
The IMPORT utility CODEPAGE modifier provides an additional capability. Introduced in Version 7 Fixpak 4, the CODEPAGE modifier will force DB2 to first convert the data file from the value of the modifier to the current system code page, and then convert from the current system code page to the database code page. Note that if the data file contains a character that is not present in the current system code page, that character will be lost when imported or loaded into the database. Truncation may also occur when data expands during code page conversion.
For example, suppose you are working on Linux running code page 954 (EUC-JP). You are given a data file, data.954, encoded in code page 954. The file data.954 contains the two characters "", represented by X'C6FC BBFE'. You want to insert the data in the file data.954 into a Unicode database.You can find the database code page by issuing the following command:
db2 get database configuration for <databaseName>
Now for a Unicode database, DB2 encodes CHAR, VARCHAR, LONG VARCHAR, CLOB columns in UTF-8, and GRAPHIC, VARGRAPHIC, LONG VARGRAPHIC, DBCLOB columns in UCS-2 big-endian format. When you import the file data.954 without specifying the CODEPAGE modifier, the two characters "" will be converted to X'E697A5 E69982' in the Unicode character columns, or X'65E5 6642' in Unicode graphic columns. The characters "" expand from 4 bytes in EUC-JP to 6 bytes in UTF-8. If the Unicode character column is defined as varchar(4), then import fails. Expansion (or contraction) may occur during code page conversion. We must have a column size that is equal to or greater than varchar(6) in this example to fit the two characters "".
However, you may want to import some data from another operating system that is encoded in a different code page. For example, you may have copied a file, data.939, from a zSeries machine that is running Japanese code page 939 to your Shift JIS 943 Japanese AIX machine. Suppose the file data.939 contains two Japanese characters: "". You will probably see "EbFQ" instead of "" on your AIX machine. The reason is your AIX environment assumes the characters are encoded in 943 and not 939. However, if you look at the file in hexadecimal, you will see the two characters are represented by the code points X'4562 4651'.
If you just naively IMPORT the
directly into a database with LANG set to "Ja_JP.IBM943" on AIX, the
inserted data will be corrupted. There are two methods to resolve this
- Convert the data file from code page 939 to code page 943 using a
third-party tool first, then IMPORT the converted file into the database.
On UNIX platforms, you can use the system utility iconv to convert the data file from one code page to another. For example, the command
iconv -f IBM-939 -t IBM-943 data.939 > data.943
will convert the file
data.939from code page 939 to code page 943 and redirect the output to the file
Note that iconv supports different code sets on different UNIX platforms. On AIX, all the iconv supported code page conversion tables are located in
/usr/lib/nls/loc/iconv/*, while on Solaris, they are in
If you import the file
data.943into a Japanese or Unicode database, no characters will be corrupted. For example,
db2 import from data.943 of asc method L (1 4) insert into <i>tableName</i>
will insert the two characters ," ", into your database without any loss.
- Use the CODEPAGE modifier of the DB2 IMPORT utility. For example:
You will find that the CODEPAGE modifier method is usually a better choice than the third-party conversion tool method because DB2 will convert the data for you in one step. The CODEPAGE modifier method does not depend on the availability of any third-party tool.
Code page considerations when using LOAD
The DB2 LOAD utility, by default, assumes the data in the input file is encoded in the database code page.
However, you may want to load some data that is not encoded in the database
code page. For example, you may have a Unicode database and your data
data.943, is encoded in the system code
page IBM-943, Shift-JIS Japanese. If you just LOAD the
data.943 file directly into the Unicode
database, the inserted data will be corrupted. The character string in a
Unicode database is encoded in UTF-8, code page 1208. There are two
methods to resolve this problem:
- Convert the data file from code page 943 to code page 1208 using a
third-party tool first, then LOAD the converted file into the
database. On UNIX platforms, you can use the system utility
iconv to convert the data file from one code page to
For example, the command
iconv -f IBM-943 -t UTF-8 data.943 > data.1208
will convert the file
data.943from code page 943 to code page 1208 and redirect the output to the file,
data.1208. If you load the file
data.1208into a Unicode database, no character will be corrupted.
- Use the CODEPAGE modifier of the DB2 UDB LOAD utility
>> "db2 load from data.943 of asc modified by codepage=943 method L (1 4) insert into tableName"
As with the IMPORT utility, the CODEPAGE modifier method is more efficient than the third-party tool method because DB2 will convert the data for you in one step, and you will not need a third-party tool.
Code page considerations when using EXPORT
The DB2 EXPORT utility by default writes the data out to an output file in the current system code page. For example, if your AIX LANG environment variable is "ja_JP", the output file from EXPORT will be encoded in code page 954.
When you export data from a database to a file, DB2 will automatically convert the data from the database code page to the current system code page. If the CODEPAGE modifier is present in the EXPORT command, DB2 will convert the data from the database code page to the current system code page, and then from the current system code page to the code page specified by the CODEPAGE modifier.
For example, suppose you have a character column containing the two characters "" in a Unicode database. The characters are stored in UTF-8 as X'E697A5 E69982'. When you export the column on a Linux machine running code page 954, the output file will contain the two characters encoded in code page 954, namely X'C6FC BBFE'.
However, you may want to prepare a data file for use by another operating system that is running in a different code page. For example, you may want to export your data from an EUC-JP database on AIX and import that file into a spreadsheet program in Windows. Your database has a column that contains two Japanese characters, "". When you export the data using a normal export command, the output file will be encoded in code page 954, with the two characters "" represented by the code points X'C6FC BBFE'.
>> "db2 export to data.954 of del select * from tableName"
With the default ANSI code page for Japanese Windows being 943 and not 954,
simply importing the file
data.954 into a
spreadsheet program on the Japanese Windows system will corrupt the
Again, there are two methods to avoid this problem:
- Convert the data file from code page 954 to code page 943 using a third-party application first prior to importing the file into the spreadsheet program.
- Use the CODEPAGE file type modifier of the DB2 EXPORT utility. Here's an example of the use of the CODEPAGE modifier with EXPORT:
DB2 UDB has the flexibility to work with different national languages and different operating systems. Even if a code page is not supported by a operating system, DB2 may support that code page for the data itself. With proper setup, you can interface with DB2 using the language of your choice and avoid data corruption.
The author would like to thank Alexis Cheng for his technical review of this material.