© 2002 International Business Machines Corporation. All rights reserved.
This article will explain how a developer can test applications running against a mixed-byte character set (MBCS) database on an English language Windows operating system or AIX operating system. Today, many developers face the need to run database tests in a variety of languages, but do not have or want more hardware resources in order to test each language on a dedicated machine. By enabling an English operating system to perform tests against multiple, heterogeneous databases, minimal hardware is required thereby lowering costs and increasing efficiency.
Here at the Developer Technical Support Center in Dallas, we find that most of the Independent Software Vendors (ISVs) who need help with using multiple language support on DB2 ask how to set up their database and operating system environments for MBCS support. So this article is intended for developers and others who, like the ISVs we work with, need to set up a MBCS database on DB2 Universal DatabaseTM Version 8 in an English operating system environment. The basic guidelines in this article can also be applied to any single-byte character set (SBCS) language environment.
We will give you the details to allow you to create, connect and perform SQL tests against a MBCS database from a DB2 Command Window in both Windows® and AIX® environments. UTF-8 databases are also used for multi-language environments, but we will not be addressing that topic here since it falls outside of the scope of this article. Note that UTF-8 has advantages worth looking in to. At the end of this paper, you will find links to UTF-8 information and other code-page-related topics.
Throughout this paper, we will be using a Japanese example for setting up the MBCS environment, creating the database, and performing SQL tests.
Brief synopsis of character set encoding
European languages are written using alphabetic scripts which are displayed as symbols to denote the pronounced sounds. Far Eastern languages are written using ideographic symbols which refer to the meaning of the word in an abstract way. Ideographic character sets are much larger than alphabetic character sets and as a result, mixed-byte character set (MBCS) support was introduced to overcome the 256 character limitation of single-byte character sets (SBCS).
Two methods are employed to address the character limitation issue: double-byte character sets (DBCS) and mixed-byte character sets (MBCS). These character sets fall into two categories:
- Universal language support - character sets that provide support for many languages (UTF and UCS)
- Single language support - each character set dedicated to a specific language and platform.
This article describes single language support. We will use Japanese language support as an illustration.
The Japanese ideographic characters can be encoded in one of two ways:
- Shift-JIS (SJIS) encoding scheme
SJIS is used for IBM-932 and the newer IBM-943 code sets. These code sets are available on most platforms. The first byte of each character is used to determine the number of bytes for that character. The values from 0x20 to 0x7F and from 0xA0 to 0xDF are used to encode ASCII and Katakana characters, with exceptions. The positions from 0x81 to 0x9f and from 0xe0 to 0xfc are reserved for use as the first byte of a multibyte character. The JISX0208 characters are mapped to the multibyte values starting at 0x8140. The second byte of a multibyte character can have any value.
- Extended UNIX® Code (EUC) encoding scheme The IBM-eucJP code set is available on UNIX platforms. The EUC encoding scheme defines a set of encoding rules that can support one to four character sets. This support for more than one character set may be necessary if you plan on storing certain graphic characters in your database. A code set based on EUC conforms to the EUC encoding rules but also identifies the specific character sets associated with the specific instances. For example, IBM-eucJP for Japanese refers to the encoding of the Japanese Industrial Standard characters according to the EUC encoding rules.
The first set, character set 0 (CS0), always contains an ASCII character set. All of the other sets must have the most significant bit (MSB) set to 1 and can use any number of bytes to encode the characters. In addition, all characters within a set must have:
- The same number of bytes to encode all characters
- The same column display width (number of columns on a fixed-width terminal).
All characters in the third set (CS2) are always preceded with the control character SS2 (single-shift 2, 0x8E). Code sets that conform to EUC do not use the SS2 control character other than to identify the third set. All characters in the fourth set (CS3) are always preceded with the control character SS3 (single-shift 3, 0x8F). Code sets that conform to EUC do not use the SS3 control character other than to identify the fourth set.
The following is the format of the different code sets supported by EUC:
<font size="1"> CS0 - 0xxxxxxx CS1 - 1xxxxxxx or 1xxxxxxx 1xxxxxxx or 1xxxxxxx 1xxxxxxx 1xxxxxxx CS2 - 10001110 1xxxxxxx or 10001110 1xxxxxxx 1xxxxxxx or 10001110 1xxxxxxx 1xxxxxxx 1xxxxxxx CS3 - 10001111 1xxxxxxx or 10001111 1xxxxxxx 1xxxxxxx or 10001111 1xxxxxxx 1xxxxxxx 1xxxxxxx
Setting up the environment
Assuming you have installed a DB2 UDB edition that has server capabilities, we can continue with the environment settings. Setting up your environment will be different depending on your choice of operating systems. We will first explain the setup process on Windows and then provide instructions for AIX.
Note: The instructions below apply to setting up your environment for testing from a DB2 Command Window. If you are only interested in setting up your environment to test from an application rather than from a DB2 Command Window session, the steps below for setting up your environment will not apply. Instead, you will need to perform the following command from a DB2 command window:
db2set DB2CODEPAGE=<codepage of your application>
We strongly recommend that the DB2CODEPAGE environment variable be reset to its original state upon completing the test. Leaving this variable enabled may cause problems. To disable the variable, type the following:
You can skip to Creating an MBCS database if you
plan to use
Configuring on Windows
To configure on Windows, you first must install the DB2 message file sets and then set up the code page conversion file sets.
Installing DB2 messages
In order for DB2 to display error, warning and informational messages after a DB2 command or statement is executed, the DB2 message file sets for your desired language must be installed. Since there are different distributions of DB2 based upon language groupings, you may have to verify that your installation CD-ROM has the desired language. On Windows, you have the option to add a particular set of messages for a language at DB2 installation time. For Version 7, you can install only one language, but on Version 8, you have the option of installing multiple languages. This new feature is particularly nice if you plan to switch between language environments. Installation time is the only opportunity you have to install DB2 messages for a particular language. Note, you must choose Custom Install rather than Typical Install during the installation process.
After you have installed the message file set for DB2, you will be able to see DB2 error, warning and informational messages after executing a DB2 command or SQL statement. Here is an example of an English message:
C:\PROGRA~1\SQLLIB\BIN>db2 "connect to jpdb" SQL0332N There is no available conversion for the source
code page "1252" to the target code page "943". Reason Code "1".
It is not necessary to install the DB2 messages for a particular language in order to create and run a MBCS database. It just provides the convenience of displaying the messages in your preferred language.
Setting up the code page conversion file sets
The next step in setting up your environment will be based upon your choice of language. In our example, we will be setting up a Japanese environment. To set up the Windows environment to handle code page conversions to Japanese, we must change some settings in the Regional Options on the Windows machine.
- Click on Start--> Settings-> Control Panel, then click on
Regional Options, which brings up the pop-up dialog box
shown in Figure 1.
Figure 1. Regional options in the Windows environment
- In the top portion of the dialog box, select Japanese for your locale. The locale affects numbers, currency, time, date and the language you will use to input your text.
- In the bottom of the dialog box, click Set default and then
click OK. (See Figure 2.) This
specification allows applications to display menus and dialogs in
their native language without affecting Windows menus and dialogs.
Figure 2. Locale dialog
- Click OK for the Regional Options dialog box, which
brings up the dialog shown in Figure 3.
Figure 3. Installation dialog
- Most likely, you will already have the files on your machine, so click Yes to install them from your hard drive. If you don't have the files on your drive, you will be prompted to insert your Windows installation CD-ROM in order to acquire them.
- Because registry changes will be made to the operating system after the files are installed, you must reboot your machine before you're ready to create the database.
Configuring on AIX
To configure on AIX, you first must install the DB2 message file sets and then set up the code page conversion file sets.
Installing DB2 messages
On AIX, DB2 messages may be installed either through
db2setup on the installation CD-ROM or through
db2setup is an executable that allows you
to install DB2 components, such as the DB2 messages file sets. DB2
provides these messages on AIX to display error, warning and informational
messages after the execution of a DB2 command or SQL statement, just as it
does on Windows. You can add additional language file sets at any time
after DB2 is installed, allowing you display the DB2 messages correlating
to the current language environment.
Both Version 7 and Version 8 allow you to install multiple language message file sets, providing flexibility when you need to switch from one language environment to another. DB2 switches message file sets automatically when it detects a change in the LANG environment variable. By default, the English messages are installed.
It is not necessary to coordinate the specific language of the database to the environment you wish to have. For instance, the Japanese message file sets for DB2 do not have to be installed to run a Japanese database. DB2 will use the English messages if it cannot find the Japanese message file sets.
Setting up the code page conversion file sets
The next step is to check to see if you have the operating system-specific file sets to enable code page conversions from the source code page to the MBCS code page. You can check to see if those file sets are installed by performing the following command to see the resulting output:
lslpp -l | grep bos.loc.pc.Ja_JP (for SJIS) bos.loc.pc.Ja_JP 18.104.22.168 COMMITTED Base System Locale PC Code Set lslpp -l | grep bos.loc.iso.ja_JP bos.loc.iso.ja_JP 22.214.171.124 COMMITTED Base System Locale ISO Code
As you can see, the Japanese file sets are already installed and so no further file set installation is required. Here is an example command for installing file sets from the AIX 4.3.3 installation CD-ROM set:
installp -acgNqwX -d /dev/cd0 -f File 2>&&;1 File: bos.loc.iso.ja_JP 126.96.36.199 bos.loc.pc.Ja_JP 188.8.131.52
After you have applied the file sets, the LANG environment variable must be set to the language type. Here is the command to use for a specific Japanese language environment:
Make sure your file sets are installed and the LANG environment variable is properly set. Not meeting these requirements will trigger the following error when you attempt to connect to a MBCS database:
$ db2 "connect to jpdb" SQL0332N There is no available conversion for the source code page "819" to the target code page "943". Reason Code "1". SQLSTATE=57017
If you will be constantly switching language environments, you will need to
db2 terminate to stop existing
backend processes that may remember the old LANG environment, and then
connect to the database.
Creating an MBCS database
You have the choice of creating your MBCS database before or after you enable your environment for the specific language. If you want to create the database first, you must provide additional information when executing the CREATE DATABASE command. Otherwise DB2 will take the default code page from the language environment.
Below we will create a Japanese database before enabling the Japanese language environment:
db2 "create database jpdb using codeset IBM-943 territory JP"
The CODESET keyword tells DB2 to create a database with the code page 943, a Japanese code page.
If you create the database after the language environment is set up, or you used db2set to set the DB2CODEPAGE environment variable to the specified code page (see Setting up the environment), all you need to do is enter the following command:
db2 "create database jpdb"
DB2 will know to create a Japanese database with a code set of IBM-943 on Windows since the Japanese environment is set up. On AIX, DB2 refers to the LANG environment variable to determine the code page for the database. If DB2CODEPAGE is set, DB2 will create the database with a code page based on this DB2 environment variable rather than checking the LANG environment variable (on AIX) or the operating system code page (on Windows).
Numerous code sets are provided by DB2 so that you can create your database with the correct code page on your chosen platform. Table 1 shows the supported Japanese code sets.
|Code Page||Group||Code-Set||Territory Identifier||Country/Region Code||Locale||OS||Country/Region Name|
* On AIX 4.3 or later the code page is 943. If you are using AIX 4.2 or
earlier, the code page is 932.
** Code page 1394 can only be used with the LOAD or IMPORT utilities to move data from code page 1394 to a DB2 Unicode database, or to EXPORT from a DB2 Unicode database to code page 1394. For more information, see the Data Movement Utilities Guide and Reference section of the Version 7.2 FixPak 4 Release Notes.
For a complete listing of code sets, see Resources.
To determine that your database is using the code page you intended, you can query the database configuration file using the following command:
db2 "get database configuration for <database name>"
Near the top of the output you will see a listing of the language and code page related information:
Database territory = JP Database code page = 943 Database code set = IBM-943 Database country code = 81
Connecting to the database and performing basic SQL tests
Connecting to the database
The following example shows how to connect to a database from the DB2 Command Window locally on the DB2 server:
db2 "connect to <database name>"
To connect to a remote database, you will need to include the
db2 "connect to <database name> user <user id> using <password>"
Performing basic SQL tests
To perform SQL tests such as insert and select statements, you will need to be able to input the MBCS data from your keyboard onto the screen. In our example, we use an English language keyboard to perform MBCS text input operations. Again, we are using the Japanese language to show how you can do this.
For Windows, you will need to open a DB2 Command Window session and type the following:
chcp <code page>
chcp is a Windows-specific command. It stands
for "change code page" and will need to be executed in order for your DB2
Command Window session to work with the input and retrieval of MBCS
characters. In our example, we changed the code page to 932, which has
proven to work as input for a 943 code page database.
After you have executed the chcp command, you can begin testing your MBCS database. At the right hand bottom of your Windows screen, you should see one of the following icons:
Either one of these icons will be displayed, allowing you to switch between the English and Japanese input locale. If you are using another input locale, you will see the appropriate icon for that locale, but the English icon will remain the same as shown above. Make sure the non-English icon is displayed. If the English icon shows up, click on the icon and the following list will allow you to choose the non-English settings:
You will also see a dialog toolbox that looks similar to the one below if you have an editor that is active and supports the fonts needed to display your language. This toolbox is located at the bottom right hand corner of your Windows screen:
You will need to set the toolbox so that you can input the correct character set. For instance, Japanese has several character sets such as Katakana and Hirigana. Click on the icon below from the tool box to select the character set of your choice:
Now let's perform some operations after connecting to the Japanese database. We will first want to create a table to allow character strings to be inserted:
db2 "create table jptbl (name varchar(20))"
So far, we have not used Japanese characters, so our input is currently English. We will now insert a row into the table:
After the above steps are accomplished, any editor, such as Notepad, can be used to type, copy, and paste the characters into the insert statement on your DB2 Command Window session. Make sure you select a font within your editor that allows the input of your language-specific characters.
Here is the result after running the insert from a DB2 Command Window:
Now, to select from the table:
On AIX, the easiest way to test the SQL statements on a MBCS database is to connect to the database from a Windows machine. To do this, you must have a DB2 client running on the Windows machine and be able to catalog the MBCS database residing on the server. The preferable way to do this is through the DB2 Client Configuration Assistant; a GUI tool that steps the user through the database cataloging process. Once this process is completed, you will be able to enter the SQL statements from the client Windows machine as if you were locally at the DB2 server.
New challenges, such as working with MBCS databases, are constantly thrown at developers who need quick solutions for their problems. The goal of this article was to help make your life a little easier by providing high quality content that will allow you tackle a specific issue by expanding your skills in setting up and running a MBCS database. As technical support personnel, we are always interested in your feedback and will actively handle any questions, issues or modifications that you supply to us.
Dig deeper into Information management on developerWorks
Get samples, articles, product docs, and community resources to help build, deploy, and manage your cloud apps.
Experiment with new directions in software development.
Software development in the cloud. Register today to create a project.
Evaluate IBM software and solutions, and transform challenges into opportunities.