Skip to main content

If you don't have an IBM ID and password, register here.

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. This profile includes the first name, last name, and display name you identified when you registered with developerWorks. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

All information submitted is secure.

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

Tip: Localization within a document format

Tailor your documents to fit a wide range of languages and cultural conventions

Uche Ogbuji (uche@ogbuji.net), Principal Consultant, Fourthought, Inc.
Photo of Uche Ogbuji
Uche Ogbuji is a consultant and co-founder of Fourthought Inc., a software vendor and consultancy specializing in XML solutions for enterprise knowledge management. Fourthought develops 4Suite, an open source platform for XML, RDF, and knowledge-management applications. Mr. Ogbuji is a computer engineer and writer born in Nigeria, living and working in Boulder, Colorado, USA. You can contact Mr. Ogbuji at uche@ogbuji.net.

Summary:  Internationalization support is one of XML's key strengths. Unfortunately, too few XML formats provide mechanisms for localizing content. This tip shows you how to develop localized XML formats.

View more content in this series

Date:  01 Sep 2002
Level:  Introductory

Comments:  

One of the key strengths of XML is its support for internationalization. Its core character set, Unicode, provides a mechanism to support more regionally popular systems -- such as the ISO-8859 variants in Europe, Shift-JIS in Japan, or BIG-5 in China. This is good. Fortunes are spent refitting applications for international deployment after they have been originally developed with a parochial point of view. Yet there is more to internationalization than support for international character repertoires. It is also important to be able to represent information in a way that can be tailored to a particular set of language and cultural conventions. This is what's known as localization.

General localization

In the data format itself (which is where XML comes in) some aspects of localization, such as date format and order of names, can be addressed with basic XML facilities. One approach is to use international standard forms; a good example of this is dates, where it is best to use the ISO 8601 standard (see Resources ). Listing 1 has an example:


Listing 1. A regional (US) date and its localized equivalent
                

<?xml version="1.0" encoding="utf-8"?>
<products>
  <!-- US-specific date -->
  <product release-date="8/18/2002"/>
  <!-- ISO-8601 date -->
  <product release-date="2002-08-18"/>
</products>

One advantage of ISO-8601 dates is that they can be generally compared as simple strings in most programming languages, unlike most local variations on dates. For example, the string "8/19/2001" is greater than "8/18/2002" in most programming systems, even though the actual date is earlier. The equivalent comparison in ISO-8601 format -- "2001-08-19" versus "2002-08-18" -- shows a more natural correspondence between the string form and actual date comparison. Localized software can then start with the ISO-8601 date and actually display fields for human consumption in the appropriate localized form. Most programming languages (including the popular EXSLT extension library for XSLT) readily support this conversion.

Another localization approach is to structure data finely, so that it can be reconstructed as appropriate locally. Names are a good example of this: In some cultures (such as Chinese) the family name precedes the given name in common usage. Listing 2 shows an example of data structured to better support such local conventions.


Listing 2. Example of structured name format for localization
                

<?xml version="1.0" encoding="utf-8"?>
<signatories>
  <!-- The direct approach. -->
  <name>Mr. Uche Ogbuji</name>
  <!-- Structure to support local conventions -->
  <name>
    <honorific>Mr.</honorific>
    <given>Uche</given>
    <family>Ogbuji</family>
  </name>
</signatories>

If the direct approach is used, a reader might try to infer the various parts of the name from the convention, but this is often risky. What if parts of the name are omitted (such as the honorific)? Can you then guess which name goes in what order? With the second approach, you can re-format names displayed for human consumption according to local conventions. In fact, if some indication of the possible preference for each entry (such as nationality) is given, the name order could be tailored on a name by name basis. The second approach clearly adds some complexity and overhead, but there is always a trade-off between practicality and flexibility when choosing various levels of markup structure to support multiple conventions.


In-line translations

Another common localization issue is presentating translations of labels, messages, descriptions, and the like. XML 1.0 provides for the specification of the language used in element content and attribute values. You can set this on an element-by-element basis. Listing 3 is an example of an XML document with parallel English and Spanish language elements.


Listing 3. An XML document with elements in localized language forms.
                

<?xml version="1.0" encoding="utf-8"?>
<menu>
  <item id="A" xml:lang="en">Orange juice</item>
  <item id="A" xml:lang="es">Jugo de naranja</item>
  <item id="B" xml:lang="en">Toast</item>
  <item id="B" xml:lang="es">Pan tostada</item>
</menu>

The xml:lang attribute can have any value allowed by RFC 1766. This means that one can use values representing primary designations of languages (en for English, es for Spanish, and so forth.). You can be more specific by adding the region where the language variant used is prevalent (for example, en-US for American English, en-GB for British English, or es-MX for Mexican Spanish). Notice that you do not need to declare a namespace here: The xml namespace is implicitly defined in every document. Also note that the language designation affects all children of the relevant element, and all other descendant content. And even though the xml:lang attribute is given special mention in the XML specification, you must still provide for it in your schema. The DTD snippet in Listing 4 illustrates this:


Listing 4. A DTD with support for xml:lang
                

<!ATTLIST item xml:lang NMTOKEN #IMPLIED "en">

This declaration adds support for the attribute, and sets up a default value of en in case the attribute is omitted. Notice that I did not add the declaration for the id attribute, which would normally be required.


Summary

There is much more to localization than can be presented in this space. For the developer, this is often more a general state of mind rather than a set of hard and fast rules. You have to constantly ask yourself, "Could some of my code and data be locked into conventions I take for granted but actually vary by region?" Learning about possible conventions of information and building this learning into code is a crucial skill for the developer. XML provides important basic tools for making this possible, if one becomes accustomed to using them.


Resources

About the author

Photo of Uche Ogbuji

Uche Ogbuji is a consultant and co-founder of Fourthought Inc., a software vendor and consultancy specializing in XML solutions for enterprise knowledge management. Fourthought develops 4Suite, an open source platform for XML, RDF, and knowledge-management applications. Mr. Ogbuji is a computer engineer and writer born in Nigeria, living and working in Boulder, Colorado, USA. You can contact Mr. Ogbuji at uche@ogbuji.net.

Report abuse help

Report abuse

Thank you. This entry has been flagged for moderator attention.


Report abuse help

Report abuse

Report abuse submission failed. Please try again later.


developerWorks: Sign in

If you don't have an IBM ID and password, register here.


Forgot your IBM ID?


Forgot your password?
Change your password


By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. This profile includes the first name, last name, and display name you identified when you registered with developerWorks. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

Choose your display name

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

(Must be between 3 – 31 characters.)


By clicking Submit, you agree to the developerWorks terms of use.

 


Rate this article

Comments

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=XML, Java technology
ArticleID=12158
ArticleTitle=Tip: Localization within a document format
publish-date=09012002
author1-email=uche@ogbuji.net
author1-email-cc=

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

For articles in technology zones (such as Java technology, Linux, Open source, XML), Popular tags shows the top tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), Popular tags shows the top tags for just that product zone.

For articles in technology zones (such as Java technology, Linux, Open source, XML), My tags shows your tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), My tags shows your tags for just that product zone.

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).