Skip to main content

Use IBM Informix 4GL to develop applications that support UTF-8 character sets

Srinivasan R. Mottupalli (mrsrinivas@in.ibm.com), Senior Software Engineer, IBM
Author Photo: Srinivasan Mottupalli
Srinivasan R Mottupalli is a senior software engineer at IBM, ISL in Bangalore, India. He has been with IBM Informix since 1997, and he has worked on various features of IBM products, including Informix Dynamic Server (IDS), Extended Parallel Server (XPS), DB2, and Informix 4GL (I4GL), as a design and development engineer.
Deepak Agrawal (deepakagrawal@in.ibm.com), Software Engineer, IBM
Photo of Deepak Agrawal
Deepak Agrawal is a software engineer at IBM, ISL, Bangalore, India. He has worked on 4GL (UTF-8 and SOA support) and ESQL/COBOL as a development engineer.

Summary:  After an introduction to the UTF character set, this article describes some of the Informix® 4GL capabilities, including searching, editing UTF characters, displaying UTF characters, and creating reports with UTF-8 characters. After reading this article, you will know how to configure Informix 4GL 7.50.xC1 to support the UTF-8 character set.

Date:  29 Oct 2009
Level:  Introductory
Activity:  1159 views

Introduction

Unicode transformation format (UTF) is a standard character set that many applications are using to support multiple languages. Traditionally, the languages have a one-to-one correspondence with the corresponding locales. So, setting one locale means using one language at a time. UTF provides more flexibility by supporting multiple languages in a single form at the same time.

Unicode transformation format

UTF is a multi-byte encoding scheme that can encode each character using as few as one byte or as many as four bytes. Asian languages, including Hindi, Japanese, and Chinese, require three bytes per character.

UTF is a very efficient encoding method. UTF provides a single encoding method for all the characters in many languages. The UTF-8 character set is easier to parse and to manipulate than any other multi-byte encoding format. Terminals that support UTF characters use transliteration to convert the given multi-byte input into the corresponding logical character.

Transliteration

Transliteration means to convert words from one language to another language with a close approximation in phonetic sound. Translation means to match words in one language with words that mean the same thing in another language. For the examples in this article, you can use any Smart Common Input Method platform (SCIM) software to transliterate the English characters to Hindi.


Understanding IDS, 4GL, and UTF-8

IBM Informix Dynamic Server supports the UTF-8 character set from IDS Version 10 and later. IBM Informix 4GL is a tool to create sophisticated database applications for Informix Database Server. Informix 4GL consists of an integrated rapid development system, an interactive debugger, and a compiler (see Resources for a link to the 4GL reference guide for more details).

Before Informix 4GL Version 7.50.xC1, UTF-8 capability was not extended to 4GL, so using database objects named using UTF-8 characters was not possible. Informix 4GL Version 7.50.xC1 and later supports the UTF-8 character set. For example, you can use UTF-8 characters in SQL operations to represent database objects, such as fields, tables, and columns. Informix 4GL supports editing UTF data with the same key strokes used for English characters.


Preparing to develop a 4GL application

Begin by setting some environment variables on the system on which you are going to develop the 4GL application with UTF-8 characters. Set the following environment variables:

  • CLIENT_LOCALE identifies the locale that the client application uses. To set CLIENT_LOCALE:
    $ export CLIENT_LOCALE=en_US.utf8

  • DB_LOCALE identifies the locale of the data in the database. To set DB_LOCALE:
    $ export DB_LOCALE= en_US.utf8

  • DELIMIDENT, when set to YES, identifies strings in double-quotes as SQL identifiers. When DELIMIDENT is not YES, strings in double-quotes are treated as string literals. To set DELIMIDENT:
    $ export DELIMIDENT=YES

    This is especially important when using UTF characters to represent database object names.

  • IBM Informix 4GL environment variables (see Resources).

Partial characters

UTF characters are multi-byte characters. These characters each have a single byte of display and multiple bytes of internal storage. A partial character is created when a multi-byte character is truncated or split up in such a way that the original sequence of bytes is lost. Informix 4GL fixes any partial characters that result from SCIM transliterating English to Hindi.

For example, consider the following SELECT statement:

SELECT नाम[1,8] from customer;

Assume it retrieves two rows containing the data values in the customer table as shown in Table 1.


Data values from example SELECT statement
Row नाम (original field value in table) Partial character handling
1 AAA3BBB3CCC3 AAA3BBB3s1s1
2 abcdefghij abcdefgh

Row 1 has a multi-byte UTF character containing 3 logical characters containing 3 bytes each (superscript denotes the storage of a single letter). Row 2 has only English characters containing 10 letters (logical characters) that contain 1 byte each. When the query selects only [1,8] characters, Row 1 results in AAA3BBB3CC. Note that the logical character C is incomplete, because results show only 2 bytes of the character. This is a partial multi-byte character. 4GL removes it and replaces it with spaces denoted by s1.

Meanwhile, Row 2 fetched 8 bytes without losing any characters. English is not a multi-byte character set, so it cannot contain partial characters.


Exploring the 4GL features with UTF-8 character set

This section describes the 4GL features available that support the UTF-8 character set. The features include the following:

UTF-8 support in forms

A customer form helps to describe the UTF features. Listing 1 shows the code for the customer form.


Listing 1. Customer.per code
DATABASE utf8

SCREEN SIZE 24 by 80
     {
     नाम     [f000    ]
     }
END
     Tables
     Customer="Informix”. customer
ATTRIBUTES
     f000 = customer.नाम

The following fields in forms can use the UTF characters:

  • Titles
  • Form headers
  • Data
  • Message
  • Search criteria

Titles

The customer form enables you to use the Hindi characters (UTF characters) in field titles, as shown in Listing 2.


Listing 2. Field titles in UTF
PERFORM:   Query Next Previous View Add Update Remove Table Screen...
Searches the active database table.             ** 1: customer table**

नाम {Name}                  [                    ]             

Form headers

The customer form enables you to give the form header with UTF characters, as shown in Listing 3.


Listing 3. Form headers in UTF
PERFORM: Query Next Previous View Add Update Remove Table Screen...
Searches the active database table.            ** 1: customer table**
                            ==============================
                            Customer Information (कस्टमर जानकारी) 
                            ==============================
नाम  {Name}                 [                    ]

Data

The customer form enables you to insert UTF characters in a field as data, as shown in Listing 4.


Listing 4. Data in UTF
ADD: ESCAPE adds new data. INTERRUPT discards it. ARROW keys move cursor.
Adds new data to the active database table.        ** 1: customer table**

नाम {Name}                  [दीपक   ]

The form in Listing 4 shows UTF data, which can be stored in and retrieved from the database.

Message

The customer form enables you to use UTF characters as messages. When you select Query or Add from the Menu dropdown, the cursor is positioned at the first field of the form, and the message that explains the data field can be in UTF, as shown in Listing 5.


Listing 5. Message in UTF
QUERY: ESCAPE queries. INTERRUPT discards query. ARROW keys move cursor.
Searches the active database table.      ** 1: customer table**

नाम  {Name}                 [                    ]






कस्टमर का नाम(Name of customer)

Search criteria

Informix 4GL also enables applications to use the UTF characters as search criteria for finding content within rows. This is very useful when searching for a name that was entered in a local language.


Listing 6. Search criteria in UTF
QUERY: ESCAPE queries. INTERRUPT discards query. ARROW keys move cursor.
Searches the active database table.      ** 1: customer table**

नाम  {Name}                 [*पक                    ]

Listing 7 shows a resulting row that matched the search criteria given in Listing 6.


Listing 7. Search results
PERFORM: Query Next Previous View Add Update Remove Table Screen...
Searches the active database table.        ** 1: customer table**

नाम {Name}                  [दीपक  ]



1 row(s) found

UTF support in reports

A report is another example to describe the UTF features. The following elements in reports can use the UTF characters:

  • UTF characters in reports
  • Screen output
  • Unload format with delimiters
  • Reports using the 4GL language

UTF characters in reports

Listing 8 shows a sample report program. The program fetches rows from the customer table and spools them to the report target, which is a standard output in this example.


Listing 8. Code for report1.4gl
DATABASE utf8

MAIN

     DEFINE p_customer RECORD LIKE customer.*
     DECLARE q_curs CURSOR FOR

          SELECT * FROM customer

               START REPORT cust_list

               FOREACH q_curs INTO p_customer.*
                    OUTPUT TO REPORT cust_list (p_customer.*)
               END FOREACH

               FINISH REPORT cust_list
END MAIN

REPORT cust_list (r_customer)
     DEFINE r_customer RECORD LIKE customer.*
     FORMAT EVERY ROW
END REPORT

Listing 9 shows the output of the report1.4gl program.


Listing 9. The output of the report1.4gl program
$ $c4gl report1.4gl –o report1
$ ./report1

r_customer.नाम


          शारीनि

     दीपक

Use the ISQL utility and choose the output option to use the capabilities listed in the following sections.


Listing 10. Selecting Output
PERFORM: Current Master Detail Output Exit
Outputs selected rows in form or report format.    ** 1: customer table**

नाम  {Name}                 [                    ]

The Output selection enables you to write selected rows to an operating system file using either Screen format or Unload format.

Screen output

To see the output in screen format, enter the output file name and click Append or create > Current-list or one-page > Screen-format. Listing 11 shows the content of the screen.out file.


Listing 11. Resulting screen.out file
$ cat screen.out

नाम  {Name}                 [दीपक]

$

Unload format with delimiters

To see the output in Unload format, enter the output file name and click Append or create > Current-list or one-page > Unload-format. Listing 12 shows the content of the unload.out file.


Listing 12. Resulting unload.out file
$ cat unload.out


शारीनि|

दीपक|


$

Reports using the 4GL language

The example program report2.4gl describes the report using 4GL feature, as shown in Listing 13.


Listing 13. Example program report2.4gl
DATABASE utf8
MAIN
     DEFINE p_customer RECORD LIKE customer.*
     DECLARE q_curs CURSOR FOR
          SELECT * FROM customer
     START REPORT qty_rep
     FOREACH q_curs INTO p_customer.*
          OUTPUT TO REPORT qty_rep (p_customer.नाम)
     END FOREACH
     FINISH REPORT qty_rep
END MAIN


REPORT qty_rep (नाम)
     DEFINE नाम LIKE customer.नाम
     FORMAT
          FIRST PAGE HEADER
               PRINT COLUMN 5, “Customer Name (कस्टमर नाम)”
               SKIP 2 LINES
               PRINT "नाम",

                    Column 15, नाम
END REPORT


Listing 14. Resulting report
$ c4gl report.4gl –o report2
$ ./report2
$


               Customer Name (कस्टमर  नाम)
               नाम     दीपक

UTF characters as database objects

You can use UTF characters as table names or column names in a database. Listing 15 shows an example table अमर with the columns नाम and पता.


Listing 15. Example table with UTF database objects
SQL: New Run Modify Use-editor Output Choose Save Info Drop Exit
Run the current SQL statements.
----------- utf8@deepak_1150 ---------- Press CTRL-W for Help--------

Create table अमर
(
नाम char(20),

पता char(20))


Table created.

Listing 16 shows that the field value of column नाम is changed from दीपक to कलपक.


Listing 16. Changing column values
UPDATE: ESCAPE changes data. INTERRUPT discards changes.
Changes this row in the active database table.             ** 1: customer table**

नाम  {Name}                 [कलपक]


Conclusion

This article described the Informix 4GL features and how UTF-8 support enhances Informix 4GL applications by handling double-byte languages.


Resources

Learn

Get products and technologies

  • Build your next development project with IBM trial software, available for download directly from developerWorks.

Discuss

About the authors

Author Photo: Srinivasan Mottupalli

Srinivasan R Mottupalli is a senior software engineer at IBM, ISL in Bangalore, India. He has been with IBM Informix since 1997, and he has worked on various features of IBM products, including Informix Dynamic Server (IDS), Extended Parallel Server (XPS), DB2, and Informix 4GL (I4GL), as a design and development engineer.

Photo of Deepak Agrawal

Deepak Agrawal is a software engineer at IBM, ISL, Bangalore, India. He has worked on 4GL (UTF-8 and SOA support) and ESQL/COBOL as a development engineer.

Comments (Undergoing maintenance)



Trademarks  |  My developerWorks terms and conditions

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management
ArticleID=437547
ArticleTitle=Use IBM Informix 4GL to develop applications that support UTF-8 character sets
publish-date=10292009
author1-email=mrsrinivas@in.ibm.com
author1-email-cc=
author2-email=deepakagrawal@in.ibm.com
author2-email-cc=

My developerWorks community

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere).

My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Special offers