Overview of bidirectional script support in IBM WebSphere Process Server

Arabic, Hebrew, Urdu, and Farsi (Persian) are written from right to left, while numbers and segments of Latin (or Cyrillic or Greek) text are embedded in this text from left to right. The dual directionality aspects of such bidirectional text are posing challenges to the way this text is processed and presented in WebSphere Process Server. This article provides an overview of the level of support for languages with bidirectional scripts in WebSphere Process Server. It outlines the design considerations and its limitations with respect to bidirectional text handling. In addition, it describes tools that can help to correctly handle bidirectional text in WebSphere Process Server.

Tomer Mahlin (tomerm@il.ibm.com), Technical team lead, IBM

Tomer MahlinTomer Mahlin is a technical team lead who for the past six years has been deeply involved in projects concerned with bidirectional enablement in IBM and non-IBM products. Tomer has been working with WebSphere Business Integration products for the last three years and was one of the chief architects who designed bidirectional enablement in WebSphere Business Integration Adapters and in IBM WebSphere Adapters.



28 September 2005

IBM® WebSphere® Process Server supports bidirectional languages such as Hebrew and Arabic through bidirectional enablement. Bidirectional enablement is a mechanism for accurately displaying and processing bidirectional script data inside components either bundled with WebSphere Process Server (for example, Web-based tools such CBE Event Browser or Business Rules Manager) or supported by it (for example, service components). This document provides an overview of bidirectional support in WebSphere Process Server and the requirements and processes used in enabling bidirectional support.

WebSphere Process Server is the central component in WebSphere Business Integration architecture responsible for orchestrating data exchange between various systems participating in the business process. Business process is the set of business-related activities, rules, and conditions that are invoked in a defined sequence to achieve a business goal. Figure 1 depicts the conceptual picture of a business process:

Figure 1. The WebSphere Process Server environment
Conceptual illustration of the relationship between WebSphere Process Server internal components, edge components, and the external world

As shown in the diagram, there is a clear separation between external systems and the WebSphere Process Server environment. External to WebSphere Process Server systems are the main "players" in the business process. They are actively participating in it by providing and exchanging business data. The WebSphere Process Server environment provides orchestrating and managing services for business process; its purpose is to interpret the process templates, manage the life-cycle of business processes, navigate through the associated process model, and integrate the appropriate business functions. In this discussion we separate components composing the WebSphere Process Server environment into WebSphere Process Server edge components and WebSphere Process Server internal components. Among possible interactions they might handle, edge components are engaged in mediation between external systems and other WebSphere Process Server components. Part of their responsibility is to pass data back and forth between WebSphere Process Server and an external source of data, such as an Enterprise Information System (EIS). Unlike edge components, internal components don't communicate with the external world and don't exchange any data with external systems.

Enabling for bidirectional scripts in WebSphere Process Server is realized on three levels:

  • Through correctly displaying and typing bidirectional script characters using Web-based runtime tools such as CBE Event Browser or Business Rules Manager
  • Through the correct storage of bidirectional characters, using when necessary the code page translation for converting bidirectional characters from one format to another (between Unicode code set and single-byte code set)
  • Through using bidirectional text transformations to translate between Windows® bidirectional format (used in the WebSphere Process Server environment) and different bidirectional formats used in external applications

Following is more detail on each of these levels and how bidirectional script enablement is realized on them.

Correct handling of bidirectional data in WebSphere Process Server GUI tools

Correct bidirectional data handling is provided automatically in WebSphere Process Server GUI tools:

  • The format used for representation of bidirectional data in the WebSphere Process Server environment is identical to the Windows standard bidirectional format (logical, left-to-right).
  • Basic technologies (for example, Internet Explorer and WebSphere Application Server) used for storage and display of bidirectional data are capable of correctly handling bidirectional data represented in the Windows standard bidirectional format.

Locale support in WebSphere Process Server

The locale is the part of a user’s environment that brings together information about how to handle data specific to the particular country, language, or territory. The locale is typically installed as part of the operating system. The locale provides information for the user environment about cultural conventions and character encoding:

Cultural conventions according to the language and country (or territory) include the following:

  • Data formats:
    • Dates define full and abbreviated names for weekdays and months, as well as the structure of the date (including date separator).
    • Numbers define symbols for the thousands separator and decimal point, as well as where these symbols are placed within the number.
    • Times define indicators for 12-hour time (such as AM and PM indicators) as well as the structure of the time.
    • Monetary values define numeric and currency symbols, as well as where these symbols are placed within the monetary value.
  • Collation order determines how to sort data for the particular character code set and language.
  • String handling includes tasks such as letter case (uppercase and lowercase), comparison, substrings, and concatenations.

Character encoding is the mapping from a character (a letter of the alphabet) to a numeric value in a character code set. For example, the ASCII character code set encodes the letter "A" as 65, while the EBCIDIC character set encodes this letter as 43. The character code set contains encoding for all characters in one or more language alphabets. The locale name has the following format:

11_TT.codeset

where 11 is a two-character language code (usually in lowercase), TT is a two-letter country and territory code (usually in uppercase), and codeset is the name of the associated character code set. The codeset portion of the name is optional.

Establishing a locale

WebSphere Process Server operates in a multilingual environment. Consequently, setting a default user locale is no more a mandatory prerequisite for processing bidi data. In other words, display of bidirectional script characters in various tools having a GUI (for example, Web-based tools such CBE Event Browser or Business Rules Manager) and WebSphere Application Server administrative console is not conditioned on setting the default user locale to Hebrew or Arabic. However, in order to be able to process natural language data in general and bidirectional data in particular, WebSphere Process Server needs careful definition of the character code set. The character code set should be set to UTF-8. More information on how this can be accomplished can be found in the next section, "Processing locale-dependent data."

However, setting the default locale does affect the way local dependent data (such as date formats used in the context of log/trace files) is presented in the system. For example, if a locale-dependent form of dates is used in the log/trace files, setting the default locale to bidirectional locale is mandatory. This necessity is not unique for languages with bidirectional scripts but common to all languages.

Processing locale-dependent data

The Java™ runtime environment within the Java Virtual Machine (JVM) represents data in the Unicode character code set. Unicode contains encoding for characters in most known character code sets (both single-byte and multibyte). Most components in WebSphere Process Server are written in Java. Therefore, when data is transferred between most WebSphere Process Server components, there is no need for character conversion.

If data is transferred between an external application and the WebSphere Process Server environment, it can be in one-byte character code rather than Unicode which is two-byte; therefore, steps need to be taken to make the correct character conversion from one format to another. To address this event, components residing on the edge between WebSphere Process Server and the external world (such as IBM WebSphere Adapters or WebSphere Business Integration Adapters) have been internationalized so that they can support both double-byte and single-byte character sets to deliver message text in the specified language. When such a component transfers data from a location that uses one character code set to WebSphere Process Server, it transforms the data from a single-byte code set to a double-byte code set (Unicode) because WebSphere Process Server uses Unicode.

To enable the WebSphere Business Integration system to process multilingual data (including bidirectional data) the encoding should be set to UTF-8. The purpose of this configuration is twofold. First, it is required for the correct display of multilingual data in the WebSphere Application Server administrative console at configuration time. Second, it allows both correct processing of multilingual data at runtime and their correct storage in the log/trace files. The configuration can be done as follows:

  • For runtime data processing and display in the WebSphere Application Server administrative console, configure WebSphere Process Server by providing the generic JVM argument -Dclient.encoding.override=UTF-8 as described at http://publib.boulder.ibm.com/infocenter/wasinfo/v5r1/topic/com.ibm.websphere.nd.doc/info/ae/ae/trun_svr_utf.html.
  • For storage of data in Log/Trace, configure WebSphere Process Server by providing another generic JVM argument -Dfile.encoding=UTF-8

Enabling WebSphere Process Server service components for bidirectional scripts

Service components are the building blocks of the business solution. They can be wired together to form service modules that can be deployed to WebSphere Process Server. Service components are the high-level components of the WebSphere Process Server integration platform and can be wired together to create powerful solutions based on the Service Oriented Architecture core technology.

Service components communicating directly with the world outside of WebSphere Process Server (for example via the invocation of Web services or via retrieving data from an EIS or external data storage) are residing on the edge between WebSphere Process Server and the outside world and are therefore considered WebSphere Process Server edge components, as shown in Figure 1. There are other service components that don't interact with the world outside of WebSphere Process Server (that is, they interact with other service components only). Those service components don't reside on the WebSphere Process Server edge and consequently are not considered WebSphere Process Server edge components.

While WebSphere Process Server can accept and send data to and from an external EIS or media in any bidirectional format, the data in the WebSphere Process Server domain is set to one uniform bidirectional format, which is the Windows standard bidirectional format (logical, left-to-right). Consequently, WebSphere Process Server edge service components are responsible for enforcing bidirectional format on bidirectional data passed through them. Data introduced into WebSphere Process Server must be represented in Windows bidirectional format (logical, left-to-right). Data sent from WebSphere Process Server to an EIS must be represented in bidirectional format used by this specific EIS. Enforcement of bidirectional data format is required only in the WebSphere Process Server edge components. This is because only systems external to WebSphere Process Server might use a standard bidirectional format for data different from Windows. In the WebSphere Process Server environment only one Windows standard bidirectional format is used.

WebSphere product support for bidirectional scripts includes service components residing on the edge between WebSphere Process Server and the external world. Such service components receive data coming from either the WebSphere Process Server environment (adapters, other service components) or from external sources such as Web services. Enforcement of bidirectional data format in those service components is required only for communication links going to or from WebSphere Process Server to or from the outside world. For data flowing through such communication links, the bidirectional data format is enforced either implicitly (if the component is enabled for automatic enforcement of bidirectional format) or explicitly, by bidirectional support or API.

Currently the only components supporting implicit enforcement of bidirectional format are export/import components working with adapters. For more information on bidirectional script enablement in the JCA 1.5-compliant IBM WebSphere Adapters, see the "Resources" section at the end of this article. (Information on bidirectional script enablement in the distributed, JMS-transport-based WebSphere Business Integration Adapters can be found in Technical Introduction to IBM WebSphere InterChange Server, version 4.3.0, available at www.ibm.com/websphere/integration/wicserver/infocenter.)

If data coming from a component that does not enforce bidirectional support, such as Web services or a connector that is non enabled for processing bidirectional data, then format inconsistencies can occur and cause the business logic in WebSphere Process Server to fail or to produce incorrect results. You can avoid these types of errors by:

  • Accepting input from sources that are enabled for bidirectional languages or in the same bidirectional format as WebSphere Business Integration products.
  • Invoking bidirectional support in adapters that are enabled for bidirectional scripts.
  • Using sample bidi APIs (based on the IBM JDK) to enforce bidirectional format of data used or introduced into the WebSphere Process Server domain from an external data source (see the "Sample bidi APIs" section below).

The next section provides more information on how to explicitly handle bidirectional script data in WebSphere Process Server.


Handling bidirectional text

This section describes the tools a customer can use for handling bidirectional text and the specifics of bidirectional text handling, which should be considered for receiving correct results. For a general overview of bidirectional script support, including layout transformations and attributes, see the article, "Bidirectional script support: A primer" (http://www.ibm.com/developerworks/websphere/library/techarticles/bidi/bidigen.html).

Sample bidi APIs

Bidi APIs are the tools that can be used for handling the transformation of bidirectional data from one bidirectional format into another. WebSphere Process Server does not provide automatic enforcement of bidirectional format for data on its edges. It also does not include the bidi APIs used to enforce a bidirectional format in different edge components. However, using the sample code below, a user can create bidi APIs, which can be used in service components to enforce the bidirectional format of data imported from an external source. In addition, they can be used for transforming business data sent from WebSphere Process Server to an external EIS into the bidirectional format used by this specific EIS.

Sample bidi APIs allow you to apply bidi transformation on objects of two types: string objects and data objects. They are based on the bidi engine implemented and bundled as part of the IBM JDK. Those two bidi APIs are described in the article, "Bidirectional language support" (http://publib.boulder.ibm.com/infocenter/dmndhelp/v6rxmx/index.jsp?topic=/com.ibm.wsps.ovw.doc/doc/rovw_global_bidiformatsupport.html).

Note that those sample bidi APIs will not handle the general case and should be viewed as samples only. They should be enhanced in order to address the general case. In the current release the user is responsible for enhancing the sample bidi API and tailoring them to his/her needs.

Cases subject to special treatment

There are certain situations where special attention needs to be taken when handling bidirectional text. One situation is migrating repository data from previous versions of WebSphere Business Integration that are not enabled for bidirectional languages. If special actions are not taken, then data with two different bidirectional formats can reside in the same business solution and can interfere with proper processing and functioning of business logic. Another situation involves transforming bidirectional text with exceptional patterns, such as an FTP URL and e-mail addresses.

Migrating data. Precautions should be used when migrating data from previous versions of WebSphere Business Integration products. During the migration process the bidirectional data stored from an earlier version can persist and be used along with new bidirectional data introduced through the components that are enabled for bidirectional languages. If this situation occurs, the Window format for the bidirectional data being manipulated on WebSphere Process Server is not guaranteed. Consequently, the processing of bidirectional data inside WebSphere Process Server might be irreversibly corrupted. Before migrating to the current WebSphere product version, you should convert all bidirectional data in the repository into the Windows bidirectional format by using the sample bidi APIs (based on IBM JDK) provided in the section above.

Special bidirectional strings. FTP URL and e-mail addresses are cases where explicit application of bidirectional transformation can result in the data being inaccurately interpreted. To ensure accurate interpretation, such strings are analyzed before transformation is begun, and problematic subcomponents within the string values are identified. In cases where problematic subcomponents are identified, the string is split and bidirectional transformation is applied on each of the subcomponents. After the transformation process has completed, the subcomponents are reassembled into one single string that represents the accurate transformed value. This value is then stored for later use. For WebSphere Process Server components, the user will have to provide his/her custom code to handle special strings.


Design limitations

The current design provides a limited solution for bidirectional support. Listed here are some of the limitations:

  • Because not all components in WebSphere Business Integration products are enabled for bidirectional languages, when such components are used alongside ones that are enabled for bidirectional languages, no uniform bidirectional format of data residing on WebSphere Process Server can be guaranteed. Consequently, this situation might result in inconsistent representation of bidirectional data on WebSphere Process Server and incorrect processing based on it. However, no mechanism for enforcing uniform bidirectional format of data in WebSphere Process Server is provided. The responsibility for enforcing this format is the responsibility of the user. For enforcing bidirectional format on data received from components that are not enabled for bidirectional languages the user is provided with sample bidi APIs.
  • The fact that the WebSphere Business Integration products’ internal bidirectional format is identical to the Windows bidirectional format is closely related to the fact that IBM WebSphere Integration Developer is supported only on Windows and Linux® platforms.
  • The selection of the Windows bidirectional format as the internal format for WebSphere Process Server and JCA Adapters presents limits for communication with applications using different bidirectional formats. For example, two applications using a visual bidirectional format and communicating via WebSphere Process Server that have, by design, an internal logical bidirectional format will have to convert bidirectional data from visual to logical and then back to visual bidirectional format. Due to the limitation of bidirectional transformation that is not transitive, this might lead to ambiguity in the bidirectional data interpretation and consequently result in incorrect execution of a business process.

Summary

This document introduced bidirectional text and how it is enabled in WebSphere Process Server. The key points are:

  • WebSphere Process Server is enabled for bidirectional languages, if appropriate system configuration is performed.
  • The internal bidirectional format in WebSphere Business Integration products is Windows bidirectional format (logical, left-to-right).
  • WebSphere Process Server components residing on the edge between WebSphere Process Server and the external world must enforce bidirectional format on data passed through them in order to avert incorrect execution of a business solution in WebSphere Process Server.

Copyright(c) IBM Corporation, 2005

Trademarks

IBM, Aptiva, DB2, and WebSphere are trademarks of International Business Machines Corporation in the United States, other countries, or both.

Microsoft and Windows are trademarks of Microsoft Corporation in the United States, other countries, or both.

Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both.

Linux is a trademark of Linus Torvalds in the United States, other countries, or both.

Other company, product or service names may be trademarks or service marks of others.

Resources

Learn

Get products and technologies

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Business process management on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Business process management, WebSphere
ArticleID=94512
ArticleTitle=Overview of bidirectional script support in IBM WebSphere Process Server
publish-date=09282005