Skip to main content

Enabling XML documents for globalization

A simple approach to organizing your translatable XML resources

Erich Magee (magee@us.ibm.com), Advisory programmer, IBM, Software Group
Author photo: Erich Magee
Erich Magee is a Java and XML developer working on the Java Suites Development Toolkit in Research Triangle Park, NC. He can be reached at (magee@us.ibm.com).

Summary:  Facilitate the process of globalizing an application by organizing translatable resources in XML documents. You can use this technique to externalize the translatable content of your documents into language-specific subdocuments, as well as to structure XML documents, along with user-specified XML elements and attributes, to naturally process your translated documents. XML and Java sample code examples show how the technologies work together to handle globalization.

Date:  01 Aug 2001
Level:  Introductory
Activity:  618 views

Thanks to the Internet, your applications are accessible to users from the furthest reaches of Earth. What language do they speak in the furthest reaches? It's hard to say. Fortunately, it's not difficult to design your application and structure your XML documents to allow different languages to be supported without causing problems with your document or, worse, your application. This article demonstrates a way to globalize your documents by carrying out three simple tasks:

  • Separating your XML document into one main document with translation tags and language-specific subdocuments
  • Setting up your application to recognize the elements that are translated and the languages that they are translated into
  • Processing your translated documents separately

Globalization issues

The fact that translated resources are an inherent part of the model definition makes globalization difficult, especially with XML documents. For example, your document could be defining anything from a data object to a display panel. This definition likely contains object structure that is constant and therefore not effected by globalization. But it also typically contains definitions of displayable text or other data that needs to vary by language. The presence of these definitions means that the document must be globalized.


Traditional XML source translation

To illustrate how this mingling of translatable resources with object structure can lead to problems, consider the following example XML document. It defines a fictitious panel containing an element that must be translated.


Listing 1: Traditional XML source document


<traditional>

<!-- ********************************************************** --> <!-- Specify data using in line character coding. --> <!-- ********************************************************** --> <panel> <widgetLabel>Label Text</widgetLabel> <widget type="text" width="20"/> </panel>
</traditional>

In the standard approach, globalization means that the document is written for a specific language, usually English, and then handed to a translation team. The translators then reproduce the document in other languages by copying the original and replacing each translatable element with the appropriate translations. This process poses some problems though. For example, how does the translator know which elements to translate? One solution is to heavily comment the source to specifically indicate the lines that are translatable. But this approach certainly is not fool-proof and often leads to error. Moreover, even after a painstaking, perfect translation, the document has been duplicated and presents a maintenance problem.

To demonstrate the process I've just described, I have copied the above file to a separate directory and translated it:


Listing 2: Traditional translated XML source document


<traditional>

<!-- ********************************************************** --> <!-- Specify data using in line character coding. --> <!-- ********************************************************** --> <panel> <widgetLabel>Texte etiquitte</widgetLabel> <widget type="text" width="20"/> </panel>
</traditional>

Remember, the translators have to determine exactly which elements require translation and change only those elements, without disturbing the surrounding structure. Even trickier, once the translated files are created, if the panel layout changes, the change must be applied to the entire set of files. For example, suppose you've decided to change the widget length from 20 characters to 30. Something this simple can create an impact that ripples back through each of the translated files.


Separating your XML source document

Now consider the following alternative structure in which translatable resources are removed. Here is the main XML document without the translated resources:


Listing 3: Main XML source document


<separated>

<!-- ********************************************************** --> <!-- Specify the translation languages that are supported. --> <!-- ********************************************************** --> <translationLanguages> <language default="true">english</language> <language>french</language> </translationLanguages>
<!-- ********************************************************** --> <!-- Specify data using translation keys. --> <!-- ********************************************************** --> <panel> <widgetLabel translationKey="myWidgetLabel"/> <widget type="text" width="20"/> </panel>
</separated>

Here are the separate XML documents containing the specific translated information:


Listing 4: Translated XML subdocuments


<separated_english>

<!-- ********************************************************** --> <!-- Specify the translated values for each element. Note that --> <!-- element tags match translation keys in the main document. --> <!-- ********************************************************** --> <myWidgetLabel>Label text</myWidgetLabel>
</separated_english>

<separated_french>

<!-- ********************************************************** --> <!-- Specify the translated values for each element. Note that --> <!-- element tags match translation keys in the main document. --> <!-- ********************************************************** --> <myWidgetLabel>Texte etiquitte</myWidgetLabel>
</separated_french>

By reorganizing the document, the translatable resources are clearly delineated from the object structure because they have been moved into separate files or subdocuments. From a practical standpoint, the subdocument is handled exactly as described above -- it has been written in English and then turned over to a translation team that then produced translated replicas. But the important difference is that there is no longer confusion over which elements should be translated. All elements are translated. This technique not only creates a clean document but it will thrill your translation team!

The other obvious change in the main source document is the addition of some translation-related elements and attributes. Specifically, some elements declare a set of supported translation languages, while other elements include an attribute declaring a translation key. If we bring these changes together, we get the following organization:

  • The main document declares a set of supported translation languages.
  • For each declared language, there is a separate document containing translated elements.
  • Each translated element in the main document contains the special translatedKey attribute.
  • The value of the translatedKey attribute is a link into the translated subdocument for the specific element.

These special translation tags not only lend a logical organization to the XML structure, they also allow an application built around an XML parser to understand how to process the globalized data. To see how this works, let's examine some Java code fragments.


Setting up your application to handle separate globalized documents

The following examples assume that you have use of a SAX parser and are familiar with the basic mechanisms of a SAX application. If you don't have this knowledge, presume that the application has created an instance of the SAX parser and an instance of a class that implements the org.xml.sax.ContentHandler interface. Also presume that it has registered the class on the instance of the parser. Once registered, the parser notifies the content handler of all parse events as the parser processes the document. The notification occurs through the use of callbacks to the content handler. You can use these callbacks to the content handler to leverage the translation elements and attributes introduced in the globalized documents.

The design approach in this example is to use the event-processing methods of the content handler to recognize certain well-known translation tags within the XML document and initiate special processing for those elements, as shown in Listing 5.

This fragment demonstrates how to hook your application-translation processing into the basic parser function. The first issue is to handle a translation tag in the form of a well-known attribute to indicate that an element has been translated. Every element in your XML document causes the parser to signal a start element event callback to your content handler. The startElement method in the content handler receives control and is passed the set of attributes specified on the element. At this point, your application can examine the attributes, checking for a translation tag. In the example, we chose the attribute name translatedKey as the tag. Of course, this could be tailored to any value you choose. If the translation attribute is found, the element is registered in a table as a key value pair. This table keeps track of all the elements declared as translatable and is used after the main document is fully parsed.

The other issue your content handler should address is the set of languages for which you have translations. In the example XML, we chose the element translationLanguage to declare a supported language. In the code fragment, all elements are passed through the processEvent method. The method to process events uses a mapping of elements to method names in order to launch specific methods that process elements. The specific method to process translationLanguage simply registers the language for later use. Use the fragment in Listing 6 as an example:

This is a good time to pause and reflect. What have we accomplished so far? So far, I have demonstrated two of the three tasks at hand:

  • Separating your XML document into one main document with translation tags and language-specific subdocuments.
  • Setting up your application to recognize the elements that are translated and the languages.

What remains is to take the information stored during the parse phase of the main document and use that to drive a separate parse phase to process each language-specific subdocument.

Processing language-specific subdocuments

The final task is to use the stored information in your application to find and process all applicable subdocuments. What your application actually does with the translated data is, of course, specific to its own purpose. In this example, the application retrieves the translated values from the subdocuments and writes them as Java properties files. But the focus of the example is the use of stored information to access subdocuments using our good friend the SAX parser.

The setup here is very similar to the parsing of the main document. An instance of the parser is obtained, and an instance of ContentHandler is registered on it. The difference is that the process is wrapped by a loop that iterates across the saved vector of declared languages. The language is appended to the main XML document name to obtain the translated subdocument name. This file name is given to the parser for processing. See Listing 7 for the last sample code fragment.

This implementation of ContentHandler simply catches translated data from the subdocument and stores it in a key-based table. In the startElement method, an element declared as translated is recognized by a quick look in the translated keys table built during the main parse process. If the element is contained in the keys table, a boolean is set to catch the element data in the characters method. Once the characters event triggers the characters method, the actual translated data is caught and added to the translated strings table. The table can be used in whatever way your application would like, and as such, the implementation of writePropertiesFileFromTable isn't really of interest. However, writing the data to a set of properties files to create Java resource bundles is a natural way to globalize your objects in a Java application.


Summary

The art of globalization is a subtle one. No single approach will satisfy every developer or application. The approach demonstrated here is a mixture of programming techniques and common sense (yes, they can be combined) that, not coincidentally, structures documents similar to Java resource bundles. Whether or not your application is written in Java is not important. The breaking apart of object structure from user data is the benefit that, I hope you'll agree, can greatly improve both your XML library and your applications.


Resources

About the author

Author photo: Erich Magee

Erich Magee is a Java and XML developer working on the Java Suites Development Toolkit in Research Triangle Park, NC. He can be reached at (magee@us.ibm.com).

Comments (Undergoing maintenance)



Trademarks  |  My developerWorks terms and conditions

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=XML
ArticleID=12029
ArticleTitle=Enabling XML documents for globalization
publish-date=08012001
author1-email=magee@us.ibm.com
author1-email-cc=

My developerWorks community

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere).

My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Rate a product. Write a review.

Special offers