Setting Up a Mixed-Byte Character Set (MBCS) Database on an English OS in DB2 UDB Version 8

This article provides step-by-step instructions for those who need to set up an environment and create a mixed-byte character set (MBCS) database on DB2 Universal Database Version 8 in an English operating system environment. You will be able to create, connect and perform SQL tests against a MBCS database from a DB2 Command Window in both Windows and AIX environments.

Share:

David KlineIBM Developer Technical Support (DTS) Center - Dallas

David Kline and Gabor Wieser work as DB2 Technical Support representatives for PartnerWorld for Developers. Along with 10 other team members, David and Gabor help Independent Software Vendors (ISVs) solve a wide range of development and administration issues. David has DB2 certifications in both application development and administration. He focuses most of his time helping ISVs with DBA (database administration) related problems. He can be reached at djkline@us.ibm.com.



Gabor WieserIBM Developer Technical Support (DTS) Center - Dallas

Gabor Wieser has over 10 years of experience with DB2 and works on a variety of DB2 issues including development with C and JavaTM programs, connectivity between client and host databases, and a wide array of database administration topics. He can be reached at gaborw@us.ibm.com. If you would like to learn more about PartnerWorld for Developers, please visit http://www.developer.ibm.com.



17 September 2002

Also available in Japanese

© 2002 International Business Machines Corporation. All rights reserved.

Introduction

This article will explain how a developer can test applications running against a mixed-byte character set (MBCS) database on an English language Windows operating system or AIX operating system. Today, many developers face the need to run database tests in a variety of languages, but do not have or want more hardware resources in order to test each language on a dedicated machine. By enabling an English operating system to perform tests against multiple, heterogeneous databases, minimal hardware is required thereby lowering costs and increasing efficiency.

Here at the Developer Technical Support Center in Dallas, we find that most of the Independent Software Vendors (ISVs) who need help with using multiple language support on DB2 ask how to set up their database and operating system environments for MBCS support. So this article is intended for developers and others who, like the ISVs we work with, need to set up a MBCS database on DB2 Universal DatabaseTM Version 8 in an English operating system environment. The basic guidelines in this article can also be applied to any single-byte character set (SBCS) language environment.

We will give you the details to allow you to create, connect and perform SQL tests against a MBCS database from a DB2 Command Window in both Windows® and AIX® environments. UTF-8 databases are also used for multi-language environments, but we will not be addressing that topic here since it falls outside of the scope of this article. Note that UTF-8 has advantages worth looking in to. At the end of this paper, you will find links to UTF-8 information and other code-page-related topics.

Throughout this paper, we will be using a Japanese example for setting up the MBCS environment, creating the database, and performing SQL tests.


Brief synopsis of character set encoding

European languages are written using alphabetic scripts which are displayed as symbols to denote the pronounced sounds. Far Eastern languages are written using ideographic symbols which refer to the meaning of the word in an abstract way. Ideographic character sets are much larger than alphabetic character sets and as a result, mixed-byte character set (MBCS) support was introduced to overcome the 256 character limitation of single-byte character sets (SBCS).

Two methods are employed to address the character limitation issue: double-byte character sets (DBCS) and mixed-byte character sets (MBCS). These character sets fall into two categories:

  • Universal language support - character sets that provide support for many languages (UTF and UCS)
  • Single language support - each character set dedicated to a specific language and platform.

This article describes single language support. We will use Japanese language support as an illustration.

The Japanese ideographic characters can be encoded in one of two ways:

  • Shift-JIS (SJIS) encoding scheme
    SJIS is used for IBM-932 and the newer IBM-943 code sets. These code sets are available on most platforms. The first byte of each character is used to determine the number of bytes for that character. The values from 0x20 to 0x7F and from 0xA0 to 0xDF are used to encode ASCII and Katakana characters, with exceptions. The positions from 0x81 to 0x9f and from 0xe0 to 0xfc are reserved for use as the first byte of a multibyte character. The JISX0208 characters are mapped to the multibyte values starting at 0x8140. The second byte of a multibyte character can have any value.
  • Extended UNIX® Code (EUC) encoding scheme The IBM-eucJP code set is available on UNIX platforms. The EUC encoding scheme defines a set of encoding rules that can support one to four character sets. This support for more than one character set may be necessary if you plan on storing certain graphic characters in your database. A code set based on EUC conforms to the EUC encoding rules but also identifies the specific character sets associated with the specific instances. For example, IBM-eucJP for Japanese refers to the encoding of the Japanese Industrial Standard characters according to the EUC encoding rules.

The first set, character set 0 (CS0), always contains an ASCII character set. All of the other sets must have the most significant bit (MSB) set to 1 and can use any number of bytes to encode the characters. In addition, all characters within a set must have:

  • The same number of bytes to encode all characters
  • The same column display width (number of columns on a fixed-width terminal).

All characters in the third set (CS2) are always preceded with the control character SS2 (single-shift 2, 0x8E). Code sets that conform to EUC do not use the SS2 control character other than to identify the third set. All characters in the fourth set (CS3) are always preceded with the control character SS3 (single-shift 3, 0x8F). Code sets that conform to EUC do not use the SS3 control character other than to identify the fourth set.

The following is the format of the different code sets supported by EUC:

<font size="1"> 
CS0 - 0xxxxxxx 
CS1 - 1xxxxxxx or 1xxxxxxx 1xxxxxxx or 1xxxxxxx 1xxxxxxx 1xxxxxxx 
CS2 - 10001110 1xxxxxxx or 10001110 1xxxxxxx 1xxxxxxx or 10001110 
      1xxxxxxx 1xxxxxxx 1xxxxxxx 
CS3 - 10001111 1xxxxxxx or 10001111 1xxxxxxx 1xxxxxxx or 10001111 
      1xxxxxxx 1xxxxxxx 1xxxxxxx

Setting up the environment

Assuming you have installed a DB2 UDB edition that has server capabilities, we can continue with the environment settings. Setting up your environment will be different depending on your choice of operating systems. We will first explain the setup process on Windows and then provide instructions for AIX.

Note: The instructions below apply to setting up your environment for testing from a DB2 Command Window. If you are only interested in setting up your environment to test from an application rather than from a DB2 Command Window session, the steps below for setting up your environment will not apply. Instead, you will need to perform the following command from a DB2 command window:

        db2set DB2CODEPAGE=<codepage of your application>

We strongly recommend that the DB2CODEPAGE environment variable be reset to its original state upon completing the test. Leaving this variable enabled may cause problems. To disable the variable, type the following:

db2set DB2CODEPAGE=

You can skip to Creating an MBCS database if you plan to use DB2CODEPAGE.

Configuring on Windows

To configure on Windows, you first must install the DB2 message file sets and then set up the code page conversion file sets.

Installing DB2 messages

In order for DB2 to display error, warning and informational messages after a DB2 command or statement is executed, the DB2 message file sets for your desired language must be installed. Since there are different distributions of DB2 based upon language groupings, you may have to verify that your installation CD-ROM has the desired language. On Windows, you have the option to add a particular set of messages for a language at DB2 installation time. For Version 7, you can install only one language, but on Version 8, you have the option of installing multiple languages. This new feature is particularly nice if you plan to switch between language environments. Installation time is the only opportunity you have to install DB2 messages for a particular language. Note, you must choose Custom Install rather than Typical Install during the installation process.

After you have installed the message file set for DB2, you will be able to see DB2 error, warning and informational messages after executing a DB2 command or SQL statement. Here is an example of an English message:

C:\PROGRA~1\SQLLIB\BIN>db2 "connect to jpdb" 
 
SQL0332N  There is no available conversion for the source 
code page "1252" to the target code page "943". Reason Code "1".
SQLSTATE=57017

It is not necessary to install the DB2 messages for a particular language in order to create and run a MBCS database. It just provides the convenience of displaying the messages in your preferred language.

Setting up the code page conversion file sets

The next step in setting up your environment will be based upon your choice of language. In our example, we will be setting up a Japanese environment. To set up the Windows environment to handle code page conversions to Japanese, we must change some settings in the Regional Options on the Windows machine.

  1. Click on Start--> Settings-> Control Panel, then click on Regional Options, which brings up the pop-up dialog box shown in Figure 1.
    Figure 1. Regional options in the Windows environment
    Figure 1. Regional options in the Windows environment
  2. In the top portion of the dialog box, select Japanese for your locale. The locale affects numbers, currency, time, date and the language you will use to input your text.
  3. In the bottom of the dialog box, click Set default and then click OK. (See Figure 2.) This specification allows applications to display menus and dialogs in their native language without affecting Windows menus and dialogs.
    Figure 2. Locale dialog
    Figure 2. Locale dialog
  4. Click OK for the Regional Options dialog box, which brings up the dialog shown in Figure 3.
    Figure 3. Installation dialog
    Figure 3. Installation dialog
  5. Most likely, you will already have the files on your machine, so click Yes to install them from your hard drive. If you don't have the files on your drive, you will be prompted to insert your Windows installation CD-ROM in order to acquire them.
  6. Because registry changes will be made to the operating system after the files are installed, you must reboot your machine before you're ready to create the database.

Configuring on AIX

To configure on AIX, you first must install the DB2 message file sets and then set up the code page conversion file sets.

Installing DB2 messages
On AIX, DB2 messages may be installed either through db2setup on the installation CD-ROM or through SMIT. db2setup is an executable that allows you to install DB2 components, such as the DB2 messages file sets. DB2 provides these messages on AIX to display error, warning and informational messages after the execution of a DB2 command or SQL statement, just as it does on Windows. You can add additional language file sets at any time after DB2 is installed, allowing you display the DB2 messages correlating to the current language environment.

Both Version 7 and Version 8 allow you to install multiple language message file sets, providing flexibility when you need to switch from one language environment to another. DB2 switches message file sets automatically when it detects a change in the LANG environment variable. By default, the English messages are installed.

It is not necessary to coordinate the specific language of the database to the environment you wish to have. For instance, the Japanese message file sets for DB2 do not have to be installed to run a Japanese database. DB2 will use the English messages if it cannot find the Japanese message file sets.

Setting up the code page conversion file sets

The next step is to check to see if you have the operating system-specific file sets to enable code page conversions from the source code page to the MBCS code page. You can check to see if those file sets are installed by performing the following command to see the resulting output:

lslpp -l | grep bos.loc.pc.Ja_JP (for SJIS) 
bos.loc.pc.Ja_JP      4.3.3.0  COMMITTED  Base System Locale PC Code Set 
 
lslpp -l | grep bos.loc.iso.ja_JP 
bos.loc.iso.ja_JP      4.3.3.0  COMMITTED  Base System Locale ISO Code

As you can see, the Japanese file sets are already installed and so no further file set installation is required. Here is an example command for installing file sets from the AIX 4.3.3 installation CD-ROM set:

installp -acgNqwX -d /dev/cd0 -f File 2>&&;1 
 
File: 
 
    bos.loc.iso.ja_JP         4.3.3.0 
    bos.loc.pc.Ja_JP          4.3.3.0

After you have applied the file sets, the LANG environment variable must be set to the language type. Here is the command to use for a specific Japanese language environment:

$export LANG=Ja_JP

Make sure your file sets are installed and the LANG environment variable is properly set. Not meeting these requirements will trigger the following error when you attempt to connect to a MBCS database:

$ db2 "connect to jpdb" 
SQL0332N  There is no available conversion for the source code page "819" to 
the target code page "943".  Reason Code "1".  SQLSTATE=57017

If you will be constantly switching language environments, you will need to perform a db2 terminate to stop existing backend processes that may remember the old LANG environment, and then connect to the database.


Creating an MBCS database

You have the choice of creating your MBCS database before or after you enable your environment for the specific language. If you want to create the database first, you must provide additional information when executing the CREATE DATABASE command. Otherwise DB2 will take the default code page from the language environment.

Below we will create a Japanese database before enabling the Japanese language environment:

db2 "create database jpdb using codeset IBM-943 territory JP"

The CODESET keyword tells DB2 to create a database with the code page 943, a Japanese code page.

If you create the database after the language environment is set up, or you used db2set to set the DB2CODEPAGE environment variable to the specified code page (see Setting up the environment), all you need to do is enter the following command:

db2 "create database jpdb"

DB2 will know to create a Japanese database with a code set of IBM-943 on Windows since the Japanese environment is set up. On AIX, DB2 refers to the LANG environment variable to determine the code page for the database. If DB2CODEPAGE is set, DB2 will create the database with a code page based on this DB2 environment variable rather than checking the LANG environment variable (on AIX) or the operating system code page (on Windows).

Numerous code sets are provided by DB2 so that you can create your database with the correct code page on your chosen platform. Table 1 shows the supported Japanese code sets.

Table 1. Supported Japanese code sets
Code PageGroupCode-SetTerritory IdentifierCountry/Region CodeLocaleOSCountry/Region Name
932D-1IBM-932JP81-OS2Japan
942D-1IBM-942JP81-OS2Japan
943D-1IBM-943JP81-OS2Japan
954D-1IBM-9eucJPJP81ja_JPAIXJapan
943*D-1IBM-943JP81Ja_JPAIXJapan
954D-1eucJPJP81ja_JP.eucJPHPJapan
5,039D-1SJISJP81ja_JP.SJISHPJapan
954D-1eucJPJP81ja_JP.eucJPHPJapan
954D-1eucJPJP81jaSCOJapan
954D-1eucJPJP81ja_JPSCOJapan
954D-1eucJPJP81ja_JP.EUCSCOJapan
954D-1eucJPJP81ja_JP.eucJPSCOJapan
954D-1eucJPJP81jaSunJapan
943D-1IBM-943JP81Ja_JP.PCKSunJapan
954D-1EUC_JPJP81ja_JPLinuxJapan
943D-1IBM-943JP81-WINJapan
930D-1IBM-930JP81-HOSTJapan
939D-1IBM-939JP81-HOSTJapan
5,026D-1IBM-5,026JP81-HOSTJapan
5,035D-1IBM-5,035JP81-HOSTJapan
1,390D-1-JP81-HOSTJapan
1,399D-1-JP81-HOSTJapan
1394**D-1-JP81--Japan

* On AIX 4.3 or later the code page is 943. If you are using AIX 4.2 or earlier, the code page is 932.
** Code page 1394 can only be used with the LOAD or IMPORT utilities to move data from code page 1394 to a DB2 Unicode database, or to EXPORT from a DB2 Unicode database to code page 1394. For more information, see the Data Movement Utilities Guide and Reference section of the Version 7.2 FixPak 4 Release Notes.

For a complete listing of code sets, see Resources.

To determine that your database is using the code page you intended, you can query the database configuration file using the following command:

db2 "get database configuration for <database name>"

Near the top of the output you will see a listing of the language and code page related information:

Database territory                          = JP 
Database code page                          = 943 
Database code set                           = IBM-943 
Database country code                       = 81

Connecting to the database and performing basic SQL tests

Connecting to the database

The following example shows how to connect to a database from the DB2 Command Window locally on the DB2 server:

       db2 "connect to <database name>"

To connect to a remote database, you will need to include the user and using keywords:

db2 "connect to <database name> user <user id> using <password>"

Performing basic SQL tests

To perform SQL tests such as insert and select statements, you will need to be able to input the MBCS data from your keyboard onto the screen. In our example, we use an English language keyboard to perform MBCS text input operations. Again, we are using the Japanese language to show how you can do this.

Windows

For Windows, you will need to open a DB2 Command Window session and type the following:

       chcp <code page>

chcp is a Windows-specific command. It stands for "change code page" and will need to be executed in order for your DB2 Command Window session to work with the input and retrieval of MBCS characters. In our example, we changed the code page to 932, which has proven to work as input for a 943 code page database.

After you have executed the chcp command, you can begin testing your MBCS database. At the right hand bottom of your Windows screen, you should see one of the following icons:

Japanese icon
Japanese icon
English icon
English icon

Either one of these icons will be displayed, allowing you to switch between the English and Japanese input locale. If you are using another input locale, you will see the appropriate icon for that locale, but the English icon will remain the same as shown above. Make sure the non-English icon is displayed. If the English icon shows up, click on the icon and the following list will allow you to choose the non-English settings:

English/Japanese dropdown

You will also see a dialog toolbox that looks similar to the one below if you have an editor that is active and supports the fonts needed to display your language. This toolbox is located at the bottom right hand corner of your Windows screen:

toolbox

You will need to set the toolbox so that you can input the correct character set. For instance, Japanese has several character sets such as Katakana and Hirigana. Click on the icon below from the tool box to select the character set of your choice:

toolbox icon

Now let's perform some operations after connecting to the Japanese database. We will first want to create a table to allow character strings to be inserted:

db2 "create table jptbl (name varchar(20))"

So far, we have not used Japanese characters, so our input is currently English. We will now insert a row into the table:

example inserting Japanese characters

After the above steps are accomplished, any editor, such as Notepad, can be used to type, copy, and paste the characters into the insert statement on your DB2 Command Window session. Make sure you select a font within your editor that allows the input of your language-specific characters.

Here is the result after running the insert from a DB2 Command Window:

example of successful insert Japanese characters

Now, to select from the table:

select of Japanese characters

AIX

On AIX, the easiest way to test the SQL statements on a MBCS database is to connect to the database from a Windows machine. To do this, you must have a DB2 client running on the Windows machine and be able to catalog the MBCS database residing on the server. The preferable way to do this is through the DB2 Client Configuration Assistant; a GUI tool that steps the user through the database cataloging process. Once this process is completed, you will be able to enter the SQL statements from the client Windows machine as if you were locally at the DB2 server.


Conclusion

New challenges, such as working with MBCS databases, are constantly thrown at developers who need quick solutions for their problems. The goal of this article was to help make your life a little easier by providing high quality content that will allow you tackle a specific issue by expanding your skills in setting up and running a MBCS database. As technical support personnel, we are always interested in your feedback and will actively handle any questions, issues or modifications that you supply to us.

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Information management on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management
ArticleID=13502
ArticleTitle=Setting Up a Mixed-Byte Character Set (MBCS) Database on an English OS in DB2 UDB Version 8
publish-date=09172002