Unicode transformation format (UTF) is a standard character set that many applications are using to support multiple languages. Traditionally, the languages have a one-to-one correspondence with the corresponding locales. So, setting one locale means using one language at a time. UTF provides more flexibility by supporting multiple languages in a single form at the same time.
UTF is a multi-byte encoding scheme that can encode each character using as few as one byte or as many as four bytes. Asian languages, including Hindi, Japanese, and Chinese, require three bytes per character.
UTF is a very efficient encoding method. UTF provides a single encoding method for all the characters in many languages. The UTF-8 character set is easier to parse and to manipulate than any other multi-byte encoding format. Terminals that support UTF characters use transliteration to convert the given multi-byte input into the corresponding logical character.
Transliteration means to convert words from one language to another language with a close approximation in phonetic sound. Translation means to match words in one language with words that mean the same thing in another language. For the examples in this article, you can use any Smart Common Input Method platform (SCIM) software to transliterate the English characters to Hindi.
Understanding IDS, 4GL, and UTF-8
IBM Informix Dynamic Server supports the UTF-8 character set from IDS Version 10 and later. IBM Informix 4GL is a tool to create sophisticated database applications for Informix Database Server. Informix 4GL consists of an integrated rapid development system, an interactive debugger, and a compiler (see Resources for a link to the 4GL reference guide for more details).
Before Informix 4GL Version 7.50.xC1, UTF-8 capability was not extended to 4GL, so using database objects named using UTF-8 characters was not possible. Informix 4GL Version 7.50.xC1 and later supports the UTF-8 character set. For example, you can use UTF-8 characters in SQL operations to represent database objects, such as fields, tables, and columns. Informix 4GL supports editing UTF data with the same key strokes used for English characters.
Preparing to develop a 4GL application
Begin by setting some environment variables on the system on which you are going to develop the 4GL application with UTF-8 characters. Set the following environment variables:
- CLIENT_LOCALE identifies the locale that the client application
uses. To set CLIENT_LOCALE:
$ export CLIENT_LOCALE=en_US.utf8
- DB_LOCALE identifies the locale of the data in the database. To set
DB_LOCALE:
$ export DB_LOCALE= en_US.utf8
- DELIMIDENT, when set to YES, identifies strings in double-quotes as SQL identifiers.
When DELIMIDENT is not YES, strings in double-quotes are treated as string literals.
To set DELIMIDENT:
$ export DELIMIDENT=YES
This is especially important when using UTF characters to represent database object names. - IBM Informix 4GL environment variables (see Resources).
UTF characters are multi-byte characters. These characters each have a single byte of display and multiple bytes of internal storage. A partial character is created when a multi-byte character is truncated or split up in such a way that the original sequence of bytes is lost. Informix 4GL fixes any partial characters that result from SCIM transliterating English to Hindi.
For example, consider the following SELECT statement:
SELECT नाम[1,8] from customer; |
Assume it retrieves two rows containing the data values in the customer table as shown in Table 1.
Data values from example SELECT statement
| Row | नाम (original field value in table) | Partial character handling |
|---|---|---|
| 1 | AAA3BBB3CCC3 | AAA3BBB3s1s1 |
| 2 | abcdefghij | abcdefgh |
Row 1 has a multi-byte UTF character containing 3 logical characters
containing 3 bytes each (superscript denotes the storage of a single
letter). Row 2 has only English characters containing 10 letters (logical
characters) that contain 1 byte each. When the query selects only [1,8] characters,
Row 1 results in AAA3BBB3CC.
Note that the logical character C is
incomplete, because results show only 2 bytes of the character. This is a partial multi-byte
character. 4GL removes it and replaces it with spaces denoted by
s1.
Meanwhile, Row 2 fetched 8 bytes without losing any characters. English is not a multi-byte character set, so it cannot contain partial characters.
Exploring the 4GL features with UTF-8 character set
This section describes the 4GL features available that support the UTF-8 character set. The features include the following:
A customer form helps to describe the UTF features. Listing 1 shows the code for the customer form.
Listing 1. Customer.per code
DATABASE utf8
SCREEN SIZE 24 by 80
{
नाम [f000 ]
}
END
Tables
Customer="Informix”. customer
ATTRIBUTES
f000 = customer.नाम
|
The following fields in forms can use the UTF characters:
- Titles
- Form headers
- Data
- Message
- Search criteria
The customer form enables you to use the Hindi characters (UTF characters) in field titles, as shown in Listing 2.
Listing 2. Field titles in UTF
PERFORM: Query Next Previous View Add Update Remove Table Screen...
Searches the active database table. ** 1: customer table**
नाम {Name} [ ]
|
The customer form enables you to give the form header with UTF characters, as shown in Listing 3.
Listing 3. Form headers in UTF
PERFORM: Query Next Previous View Add Update Remove Table Screen...
Searches the active database table. ** 1: customer table**
==============================
Customer Information (कस्टमर जानकारी)
==============================
नाम {Name} [ ]
|
The customer form enables you to insert UTF characters in a field as data, as shown in Listing 4.
Listing 4. Data in UTF
ADD: ESCAPE adds new data. INTERRUPT discards it. ARROW keys move cursor.
Adds new data to the active database table. ** 1: customer table**
नाम {Name} [दीपक ]
|
The form in Listing 4 shows UTF data, which can be stored in and retrieved from the database.
The customer form enables you to use UTF characters as messages. When you select Query or Add from the Menu dropdown, the cursor is positioned at the first field of the form, and the message that explains the data field can be in UTF, as shown in Listing 5.
Listing 5. Message in UTF
QUERY: ESCAPE queries. INTERRUPT discards query. ARROW keys move cursor.
Searches the active database table. ** 1: customer table**
नाम {Name} [ ]
कस्टमर का नाम(Name of customer)
|
Informix 4GL also enables applications to use the UTF characters as search criteria for finding content within rows. This is very useful when searching for a name that was entered in a local language.
Listing 6. Search criteria in UTF
QUERY: ESCAPE queries. INTERRUPT discards query. ARROW keys move cursor.
Searches the active database table. ** 1: customer table**
नाम {Name} [*पक ]
|
Listing 7 shows a resulting row that matched the search criteria given in Listing 6.
Listing 7. Search results
PERFORM: Query Next Previous View Add Update Remove Table Screen...
Searches the active database table. ** 1: customer table**
नाम {Name} [दीपक ]
1 row(s) found
|
A report is another example to describe the UTF features. The following elements in reports can use the UTF characters:
- UTF characters in reports
- Screen output
- Unload format with delimiters
- Reports using the 4GL language
Listing 8 shows a sample report program. The program fetches rows from the customer table and spools them to the report target, which is a standard output in this example.
Listing 8. Code for report1.4gl
DATABASE utf8
MAIN
DEFINE p_customer RECORD LIKE customer.*
DECLARE q_curs CURSOR FOR
SELECT * FROM customer
START REPORT cust_list
FOREACH q_curs INTO p_customer.*
OUTPUT TO REPORT cust_list (p_customer.*)
END FOREACH
FINISH REPORT cust_list
END MAIN
REPORT cust_list (r_customer)
DEFINE r_customer RECORD LIKE customer.*
FORMAT EVERY ROW
END REPORT
|
Listing 9 shows the output of the report1.4gl program.
Listing 9. The output of the report1.4gl program
$ $c4gl report1.4gl –o report1
$ ./report1
r_customer.नाम
शारीनि
दीपक
|
Use the ISQL utility and choose the output option to use the capabilities listed in the following sections.
Listing 10. Selecting Output
PERFORM: Current Master Detail Output Exit
Outputs selected rows in form or report format. ** 1: customer table**
नाम {Name} [ ]
|
The Output selection enables you to write selected rows to an operating system file using either Screen format or Unload format.
To see the output in screen format, enter the output file name and click Append or create > Current-list or one-page > Screen-format. Listing 11 shows the content of the screen.out file.
Listing 11. Resulting screen.out file
$ cat screen.out
नाम {Name} [दीपक]
$
|
To see the output in Unload format, enter the output file name and click Append or create > Current-list or one-page > Unload-format. Listing 12 shows the content of the unload.out file.
Listing 12. Resulting unload.out file
$ cat unload.out शारीनि| दीपक| $ |
Reports using the 4GL language
The example program report2.4gl describes the report using 4GL feature, as shown in Listing 13.
Listing 13. Example program report2.4gl
DATABASE utf8
MAIN
DEFINE p_customer RECORD LIKE customer.*
DECLARE q_curs CURSOR FOR
SELECT * FROM customer
START REPORT qty_rep
FOREACH q_curs INTO p_customer.*
OUTPUT TO REPORT qty_rep (p_customer.नाम)
END FOREACH
FINISH REPORT qty_rep
END MAIN
REPORT qty_rep (नाम)
DEFINE नाम LIKE customer.नाम
FORMAT
FIRST PAGE HEADER
PRINT COLUMN 5, “Customer Name (कस्टमर नाम)”
SKIP 2 LINES
PRINT "नाम",
Column 15, नाम
END REPORT
|
Listing 14. Resulting report
$ c4gl report.4gl –o report2
$ ./report2
$
Customer Name (कस्टमर नाम)
नाम दीपक
|
UTF characters as database objects
You can use UTF characters as table names or column names in a database. Listing 15 shows an example table अमर with the columns नाम and पता.
Listing 15. Example table with UTF database objects
SQL: New Run Modify Use-editor Output Choose Save Info Drop Exit Run the current SQL statements. ----------- utf8@deepak_1150 ---------- Press CTRL-W for Help-------- Create table अमर ( नाम char(20), पता char(20)) Table created. |
Listing 16 shows that the field value of column नाम is changed from दीपक to कलपक.
Listing 16. Changing column values
UPDATE: ESCAPE changes data. INTERRUPT discards changes.
Changes this row in the active database table. ** 1: customer table**
नाम {Name} [कलपक]
|
This article described the Informix 4GL features and how UTF-8 support enhances Informix 4GL applications by handling double-byte languages.
Learn
- Learn more about Informix from the IBM Informix Dynamic
Server 11 Information Center.
- Refer to the Informix 4GL V7.x
manuals.
- Find the Informix 4GL
environment variables.
- Check out Informix
Unleashed by Glenn Miller, Jim Prajesh, Jose Fortuny, and John McNally. Using a hands-on approach, this guide serves as a high-level
tutorial for users who are new to Informix. It is also a helpful reference for
those who know the product but need additional tips, tricks, and
workarounds.
- Find out more about transliteration from Google.
- Learn more about Information Management at the developerWorks Information Management
zone. Find technical documentation,
how-to articles, education, downloads, product information, and
more.
- Stay current with
developerWorks technical events and webcasts.
Get products and technologies
- Build your next
development project with
IBM trial software,
available for download directly from developerWorks.
Discuss
- Check out the
developerWorks
blogs and get involved in the
developerWorks community.

Srinivasan R Mottupalli is a senior software engineer at IBM, ISL in Bangalore, India. He has been with IBM Informix since 1997, and he has worked on various features of IBM products, including Informix Dynamic Server (IDS), Extended Parallel Server (XPS), DB2, and Informix 4GL (I4GL), as a design and development engineer.
Comments (Undergoing maintenance)






