DB2 UDB National Language Support for the Command Line Processor and Utilities

The DB2 UDB Command Line Processor (CLP) provides an interface for the execution of DB2 commands, SQL commands, and DB2 utilities. This article explains how to use the CLP in the national language of your choice on both Windows and AIX environments. It also covers national language code page considerations for IMPORT, EXPORT and LOAD.

Share:

Sherman Lau, Database Developer , IBM Toronto Lab

Sherman Lau is a database developer in the IBM Toronto Lab and has worked primarily on DB2 globalization for the past seven years. He can be reached at sherman@ca.ibm.com. You can reach Sherman Lau at sherman@ca.ibm.com.



03 October 2002

Introduction

DB2® Universal DatabaseTM (UDB) is used worldwide and is capable of handling many national languages. If you're using DB2 UDB in a non-English environment, you must make sure you are set up correctly. Without the necessary environmental setup, you may not see your preferred national langage when you interface with DB2. In addition, if code page issues are not considered, characters within the database may be corrupted when you use certain utilities. This article describes how you can interface with DB2 UDB in a specific national language and how code page affects the data in a database.


DB2 UDB Command Line Processor

DB2 Command Line Processor (CLP) is the command line interface to execute database utilities, SQL statements, and online help. Except for the reserved DB2 commands and keywords, which are always in English, CLP interacts with you in the national language of your operating system command prompt.

DB2 Version 8 is available in the following national languages:

  • English
  • French
  • German
  • Italian
  • Spanish
  • Brazilian Portuguese
  • Japanese
  • Korean
  • Simplified Chinese
  • Traditional Chinese
  • Danish
  • Finnish
  • Norwegian
  • Swedish
  • Russian
  • Polish
  • Arabic
  • Croatian
  • Czech
  • Dutch
  • Hebrew
  • Hungarian
  • Continental Portuguese
  • Romanian
  • Slovakian
  • Turkish
  • Bulgarian
  • Slovenian

DB2 speaks to you in your language of choice if that language is available. Otherwise, the CLP will display in English. Let's take a look at what makes a particular national language available for DB2's use with the Command Line Processor.

UNIX

On UNIX® platforms, CLP interacts with you in the language specified by the environment variable LANG, if that language is available in DB2 and installed. Otherwise English is used. For example, if LANG is set to "Ja_JP" on AIX®, then CLP will retrieve the Shift JIS Japanese messages from the sqllib/msg/Ja_JP directory. If you see garbage characters, you probably do not have the font or locale installed correctly on your system. For information on installing font and locale, see the article Setting Up a Mixed-Byte Character Set (MBCS) Database on an English OS in DB2 UDB Version 8.

To check your current LANG setting, type locale. CLP can be invoked by typing db2. The following example shows how you would issue the locale command, begin the CLP, and then issue a few commands in a Japanese environment.

Command Line Processor with Japanese

Windows and OS/2

On Windows® and OS/2®, only one national language version of DB2 Version 7 can be installed. CLP will use that language. However, on DB2 UDB Version 8, support for multiple national languages can be installed on Windows. (NOTE: DB2 UDB Version 8 does not support OS/2). Regardless of the national language version of DB2, CLP will use the language of Windows. The National Language Support Appendix in the DB2 UDB Administration Guide - Planning manual contains important additional information about DB2's national language support capabilities.


DB2 UDB EXPORT, IMPORT and LOAD utilities

The DB2 EXPORT utility allows users to write data from a DB2 database to one or more files stored outside of the database. The DB2 UDB IMPORT and LOAD utilities allow users to insert data from a file into a database table or updatable view.

In the following sections, we will discuss the code page related issue for the DB2 UDB EXPORT, IMPORT, and LOAD utilities. For further information about these utilities, consult the DB2 UDB Data Movement Utilities Guide and Reference.

DB2 allows you to create a database in one of the supported code sets in any supported system. For example, you can create an IBM-932, IBM-943, IBM-EUCJP/EUCJP or UTF-8 database on a Linux machine, although Linux may not support the code set IBM-932 or IBM-943. Officially Linus does not support IBM-932 or IBM-943, but since Linux is open source, the Linux user can add that support.

A character may be represented by different code points in different code pages. For example, the following table lists the code points for the same two Japanese characters "2 Japanese characters" in the various available code sets:

Code setCode points
IBM-932X'93FA 8E9E'
IBM-943X'93FA 8E9E'
IBM-EUCJP/EUCJPX'C6FC BBFE'
Unicode UTF-8X'E697A5 E69982'
Unicode UCS-2 (big-endian)X'65E5 6642'

Code page considerations when using IMPORT

The DB2 IMPORT utility, by default, assumes the data in the input file is encoded in the current system code page. For example on AIX, if the environment variable LANG is set to "Ja_JP.IBM943", then IMPORT will assume your input data file is encoded in code page 943. When you import a data file into a database, DB2 will automatically convert the data file from the current system code page to the database code page.

The IMPORT utility CODEPAGE modifier provides an additional capability. Introduced in Version 7 Fixpak 4, the CODEPAGE modifier will force DB2 to first convert the data file from the value of the modifier to the current system code page, and then convert from the current system code page to the database code page. Note that if the data file contains a character that is not present in the current system code page, that character will be lost when imported or loaded into the database. Truncation may also occur when data expands during code page conversion.

For example, suppose you are working on Linux running code page 954 (EUC-JP). You are given a data file, data.954, encoded in code page 954. The file data.954 contains the two characters "2 Japanese characters", represented by X'C6FC BBFE'. You want to insert the data in the file data.954 into a Unicode database.You can find the database code page by issuing the following command:

db2 get database configuration for <databaseName>

Now for a Unicode database, DB2 encodes CHAR, VARCHAR, LONG VARCHAR, CLOB columns in UTF-8, and GRAPHIC, VARGRAPHIC, LONG VARGRAPHIC, DBCLOB columns in UCS-2 big-endian format. When you import the file data.954 without specifying the CODEPAGE modifier, the two characters "2 Japanese characters" will be converted to X'E697A5 E69982' in the Unicode character columns, or X'65E5 6642' in Unicode graphic columns. The characters "2 Japanese characters" expand from 4 bytes in EUC-JP to 6 bytes in UTF-8. If the Unicode character column is defined as varchar(4), then import fails. Expansion (or contraction) may occur during code page conversion. We must have a column size that is equal to or greater than varchar(6) in this example to fit the two characters "2 Japanese characters".

However, you may want to import some data from another operating system that is encoded in a different code page. For example, you may have copied a file, data.939, from a zSeries machine that is running Japanese code page 939 to your Shift JIS 943 Japanese AIX machine. Suppose the file data.939 contains two Japanese characters: "2 Japanese characters". You will probably see "EbFQ" instead of "2 Japanese characters" on your AIX machine. The reason is your AIX environment assumes the characters are encoded in 943 and not 939. However, if you look at the file in hexadecimal, you will see the two characters are represented by the code points X'4562 4651'.

If you just naively IMPORT the data.939 file directly into a database with LANG set to "Ja_JP.IBM943" on AIX, the inserted data will be corrupted. There are two methods to resolve this problem:

  • Convert the data file from code page 939 to code page 943 using a third-party tool first, then IMPORT the converted file into the database.
    On UNIX platforms, you can use the system utility iconv to convert the data file from one code page to another. For example, the command
    iconv -f  IBM-939 -t IBM-943 data.939 > data.943

    will convert the file data.939 from code page 939 to code page 943 and redirect the output to the file data.943.

    Note that iconv supports different code sets on different UNIX platforms. On AIX, all the iconv supported code page conversion tables are located in /usr/lib/nls/loc/iconv/*, while on Solaris, they are in /usr/lib/iconv/*.

    If you import the file data.943 into a Japanese or Unicode database, no characters will be corrupted. For example,

      db2 import from data.943 of asc method L (1 4) insert into <i>tableName</i>

    will insert the two characters ,"2 Japanese characters ", into your database without any loss.
  • Use the CODEPAGE modifier of the DB2 IMPORT utility. For example:
    Importing with Japanese characters

You will find that the CODEPAGE modifier method is usually a better choice than the third-party conversion tool method because DB2 will convert the data for you in one step. The CODEPAGE modifier method does not depend on the availability of any third-party tool.

Code page considerations when using LOAD

The DB2 LOAD utility, by default, assumes the data in the input file is encoded in the database code page.

However, you may want to load some data that is not encoded in the database code page. For example, you may have a Unicode database and your data file, data.943, is encoded in the system code page IBM-943, Shift-JIS Japanese. If you just LOAD the data.943 file directly into the Unicode database, the inserted data will be corrupted. The character string in a Unicode database is encoded in UTF-8, code page 1208. There are two methods to resolve this problem:

  • Convert the data file from code page 943 to code page 1208 using a third-party tool first, then LOAD the converted file into the database. On UNIX platforms, you can use the system utility iconv to convert the data file from one code page to another.

    For example, the command

      iconv -f IBM-943 -t UTF-8 data.943 > data.1208

    will convert the file data.943 from code page 943 to code page 1208 and redirect the output to the file, data.1208. If you load the file data.1208 into a Unicode database, no character will be corrupted.
  • Use the CODEPAGE modifier of the DB2 UDB LOAD utility
      >> "db2 load from data.943 of asc modified by codepage=943 
       method L (1 4) insert into tableName"

As with the IMPORT utility, the CODEPAGE modifier method is more efficient than the third-party tool method because DB2 will convert the data for you in one step, and you will not need a third-party tool.

Code page considerations when using EXPORT

The DB2 EXPORT utility by default writes the data out to an output file in the current system code page. For example, if your AIX LANG environment variable is "ja_JP", the output file from EXPORT will be encoded in code page 954.

When you export data from a database to a file, DB2 will automatically convert the data from the database code page to the current system code page. If the CODEPAGE modifier is present in the EXPORT command, DB2 will convert the data from the database code page to the current system code page, and then from the current system code page to the code page specified by the CODEPAGE modifier.

For example, suppose you have a character column containing the two characters "2 Japanese characters" in a Unicode database. The characters are stored in UTF-8 as X'E697A5 E69982'. When you export the column on a Linux machine running code page 954, the output file will contain the two characters encoded in code page 954, namely X'C6FC BBFE'.

However, you may want to prepare a data file for use by another operating system that is running in a different code page. For example, you may want to export your data from an EUC-JP database on AIX and import that file into a spreadsheet program in Windows. Your database has a column that contains two Japanese characters, "2 Japanese characters". When you export the data using a normal export command, the output file will be encoded in code page 954, with the two characters "2 Japanese characters" represented by the code points X'C6FC BBFE'.

>> "db2 export to data.954 of del select * from tableName"

With the default ANSI code page for Japanese Windows being 943 and not 954, simply importing the file data.954 into a spreadsheet program on the Japanese Windows system will corrupt the data.

Again, there are two methods to avoid this problem:

  • Convert the data file from code page 954 to code page 943 using a third-party application first prior to importing the file into the spreadsheet program.
  • Use the CODEPAGE file type modifier of the DB2 EXPORT utility. Here's an example of the use of the CODEPAGE modifier with EXPORT:
    export example with Japanese characters

Conclusion

DB2 UDB has the flexibility to work with different national languages and different operating systems. Even if a code page is not supported by a operating system, DB2 may support that code page for the data itself. With proper setup, you can interface with DB2 using the language of your choice and avoid data corruption.

Acknowledgements

The author would like to thank Alexis Cheng for his technical review of this material.

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Information management on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management
ArticleID=14052
ArticleTitle=DB2 UDB National Language Support for the Command Line Processor and Utilities
publish-date=10032002