Skip to main content

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

All information submitted is secure.

  • Close [x]

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

  • Close [x]

Java internationalization basics

Joe Sam Shirah (joesam@conceptgo.com), Principal and developer, conceptGO
Joe Sam Shirah is a principal and developer at conceptGO, which provides remote consulting and software development services, as well as products, with specialties in JDBC, I18N, the AS/400, RPG, finance, inventory, and logistics. Joe Sam was presented with the Java Community Award at JavaOne, 1998, and is the author of the JDBC 2.0 Fundamentals short course at the Java Developer Connection. He is the moderator of the developerWorks "Java filter" discussion forum and manager for jGuru's JDBC, I18N, and Java400 FAQs. Joe Sam has a B.B.A. in Economics and a Master's degree in International Management.

Summary:  This tutorial introduces you to the Java programming language's support for multilingual and multicountry environments. The tutorial begins with a general discussion of internationalization principles and concepts, and then moves on to an overview of the specific areas of Java internationalization support. The last few sections provide a more hands-on discussion of the areas basic to any internationalized Java application: Unicode and Java characters; locales and resource bundles; and formatting dates, numbers, and currencies.

Date:  23 Apr 2002
Level:  Introductory PDF:  A4 and Letter (202 KB | 64 pages)Get Adobe® Reader®

Activity:  21606 views
Comments:  

Overview of the Java platform support for I18N

Internationalization and the Java programming language

Unlike programmers in most other languages, Java programmers are the beneficiaries of a significant amount of standard code built into the JDK for I18N support. A large portion of the code originally came from IBM's Taligent subsidiary (since merged into IBM) and represents many person-years of work, far more than would be feasible for most companies to independently provide in their products.

The code and vision has not always been perfect; take a look at the many deprecated methods in the java.util.Date class, for example. And, many of us can remember when Pacific Standard Time was also apparently Java World Time. However, even in the "bad old days," few, if any, other languages had (or have) anything to compare to this built-in capability. This section briefly discuss the general I18N areas supported by the Java platform.

Unicode support

The Java language character set is Unicode, and the primitive char datatype is, accordingly, two bytes (16 bits) in length to accommodate Unicode values. Because the familiar String is composed of chars, a String is also Unicode based. Unicode itself is defined so that the values 0 through 127 match standard ASCII and 0 through 255 match the ISO 8859-1 (Latin-1) standard. Due to this conformity in the beginning values, programmers who don't use I18N facilities or face I18N issues can write Java programs without understanding or knowing about Unicode. However, given the ubiquity of Windows, programmers for that platform should be aware that there are differences between standard ISO 8859-1 and Windows Latin-1 (cp1252).

The 16-bit char length allows values between 0 and 65535. Unicode escapes are provided to allow input when the actual character is not supported by the native platform. These are in the form of "\u" followed by four hexadecimal digits from 0000 to FFFF. The following two lines of code, for example, are equivalent:

char c1 = 'a';
char c2 = '\u0061';

The 1.3 version of the JDK/JRE supports Unicode 2.1; the 1.4 version supports Unicode 3.0. For more information about Unicode and a Unicode display program called UniBook, see the link to the Unicode Consortium in Resources.


Character-set conversions and stream input/output

The previous section mentions that the Java character set is Unicode, but not all platforms support Unicode. So how is this magic accomplished? The answer is that all input and output streams that support characters -- that is, the java.io.Reader and java.io.Writer hierarchies -- automatically invoke a hidden layer of code that converts from the platform's native encoding to Unicode and back. Notice that the native encoding is assumed. If the data is not in the default encoding, you will have to convert the data yourself. Fortunately, the java.io.InputStreamReader, java.io.OutputStreamWriter, and java.lang.String classes have methods that allow conversion specification with supported encodings. You can find these under Supported Encodings in the Internationalization section of the JDK documentation (accessible from Resources). Note that JDK 1.4 now provides support for Thai and Hindi encodings.

As a point of interest, the Java guarantee of big-endian format for numerics is not upheld for the char datatype. The default format is platform dependent. On NT 4.0 for example, the system property "sun.io.unicode.encoding" is set to "UnicodeLittle". If, for some reason, you want to specify the format yourself, you have a documented choice of UnicodeBig, UnicodeBigUnmarked, UnicodeLittle, UnicodeLittleUnmarked, UTF8, or UTF-16.


Character classification and the Character class

In addition to defining characters for many languages in a standard manner, Unicode also defines several properties for each character. These properties identify such things as the general category, bidirectionality, uppercase, lowercase, whether the character is a digit or control character, and so on. These properties are defined in the UnicodeData file available at the Unicode Consortium Web site.

The Java Character class provides methods to obtain these properties. While a specific instance is immutable, many of the methods are static, allowing access to a character's properties on the fly.

An example of the usefulness of this class comes from a typical ASCII programming algorithm: many programmers take advantage of the fact that if a character's value is in the range 0x41 through 0x5A, it is a capital letter (A-Z). By adding 0x20, you get lowercase letter (a-z). Unfortunately, the algorithm will fail when dealing with languages that contain characters beyond the ASCII range. The solution is to use Character.isUpperCase() and Character.toLowerCase(), which work in any circumstance. Another example is Character.isDigit(), which also works for characters that represent digits outside the ASCII '0' through '9' range.


Locales

In the Java language, a locale is just an identifier, not a set of localized attributes. An instance of the java.util.Locale class represents a specific geopolitical area and is created with arguments for a language and region or country. Each locale-sensitive class maintains its own set of localized attributes and determines how to respond to a method request that contains a Locale argument.

Given the preceding statements, it should be clear that there are no constraints regarding how a programmer may respond to a method request that contains a Locale argument. However, in Sun's reference Java 2 platform and other conforming implementations, there is a consistent set of supported localizations. See Supported Locales in the Internationalization section of the JDK documentation (accessible from Resources) for more information. You should note that the documentation lists a number of locales as "also provided, but not tested." I have personally seen this "not tested" issue arise with the Finnish (fi_FI) locale in JDK 1.3.1; caveat emptor.


AWT/Swing Name and Locale attributes

The java.awt.Component class includes getters and setters for Name and Locale attributes. While the documentation also discusses constructors for Component and its subclasses that take the Name argument, I apparently need glasses more than I thought, because I have never been able to find them. Component is in the hierarchy for most Swing classes and they automatically support these attributes as well.

The Name attribute is a non-localized String that you can assign programmatically. It may sound odd that this assists in internationalization, but with most data changing according to locale, Name provides a set anchor to identify the component. Within a given class, of course, testing object references for object equality can serve the same purpose. While there are good reasons for either technique, I customarily use object equality testing in actionPerformed() methods, as you can see in the code examples. The documentation states that a default Name is assigned if not programmatically set, but no value or pattern is given. In the code I've written, Component.getName() returns null if invoked prior to Component.setName("aName"). As undocumented behavior, of course, results may not be consistent and could change in the future. Therefore, when the Name attribute is to be used, good programming practice would call for setting the Name attribute for all components to a standard value that means "unset", then setting the desired components as appropriate.

The Locale attribute allows a component to track its own locale even when the rest of an application is using a different locale. This technique can be very useful in certain situations, although for Components with text values, the text can be localized before being sent to the Component without the need for setting a specific Component Locale.


Localized resources

java.util.ResourceBundle is an abstract class that provides mechanisms for storing and locating resources used by an application. The resources are usually localized Strings, but may be any Java object. ResourceBundles are set up in a sort of hierarchy, beginning with a general ResourceBundle with a base name, then getting more specific by adding language and country identifiers (as defined in Supported Locales in the JDK documentation Internationalization section, which is accessible from Resources) to the base name of additional ResourceBundles. The three great advantages of ResourceBundles are:

  • The class loader mechanism is used to locate a ResourceBundle, so no additional I/O code is needed.

  • ResourceBundle "knows" how to search the hierarchy for a locale-appropriate instance, from specific to general, using the static getBundle(String baseName) or getBundle(String baseName, Locale locale) methods.

  • If a resource is not found in a specific instance, the resource from a more general instance will be used.

The good news/bad news is that, once loaded, ResourceBundle instances are cached under the covers as a performance optimization; this cache is never refreshed and there is no official way to manipulate the cache.

ResourceBundle has two subclasses:

  • ListResourceBundle, which is another abstract class, so you must provide your own implementation. Primarily, you must override getContents(), which returns a two-dimensional Object array (Object[][]). This kind of ResourceBundle can return any type of Object.

  • PropertyResourceBundle, a concrete class that is backed by a java.util.Properties file and can return only Strings.

You can provide your own custom subclasses as well. In that case, you must override and provide implementations for handleGetObject() and getKeys(String key).

ResourceBundles use key/value pairs and provide getString(String key) and getObject(String key) methods. You can also use getKeys() to obtain an Enumeration of available keys.


Calendar and time zone support

java.util.Date was originally intended to handle date and time operations, but inherent flaws have reduced it to representing a specific moment in time. The abstract class java.util.Calendar and its concrete subclass java.util.GregorianCalendar were introduced in JDK 1.1 to handle java.util.Date's deficiencies. The Calendar classes have methods to obtain all date and time fields as well as performing date and time arithmetic.

The abstract java.util.TimeZone class and its concrete subclass java.util.SimpleTimeZone maintain standard and daylight savings time offsets from Universal Coordinated Time (abbreviated UTC, not UCT as you would expect; the abbreviation is taken from the French form for historical reasons). In addition, TimeZone also contains methods to obtain both native and localized time zone display names.


Formatting and parsing

Numbers, currencies, dates, times, and program messages are all affected by cultural and regional differences, and require significant formatting and parsing effort for localization. The abstract class java.text.Format and its subclasses were created to cope with this I18N area. All of the subclasses have locale-sensitive format() and parse() methods to manipulate values in a locale-sensitive manner. The parse() methods will throw ParseException on invalid values. The concrete subclasses java.text.SimpleDateFormat and java.text.DecimalFormat allow patterns and access to the appropriate symbols for the instance. In general, the abstract parent classes have getInstance() and getXXXInstance() static factory methods that return appropriately localized objects.

Following is a list of the direct subclasses of java.text.Format:

  • The abstract java.text.DateFormat class and its concrete subclass java.text.SimpleDateFormat, backed by the java.text.DateFormatSymbols class, are used to deal with date and time values.

  • The abstract java.text.NumberFormat class and its concrete subclasses java.text.ChoiceFormat and java.text.DecimalFormat, backed by the java.text.DecimalFormatSymbols class, are used to deal with numbers, currencies and percentages.

  • java.text.MessageFormat allows "soft coded" location and formatting of values to be inserted into localized messages.

For JDK/JRE 1.4, java.util.Currency has been added so that currencies can be used independently from locale. java.text.NumberFormat has new methods to deal with currencies and integers.


Locale-sensitive String operations

As developers, we often need to manipulate, search, and sort Strings. This work can be incredibly difficult when multiple languages are involved. The Java platform provides the following classes to assist:

  • The abstract java.text.Collator class and its concrete subclass java.text.RuleBasedCollator allow for locale-sensitive String comparisons.

  • The java.text.CollationElementIterator class iterates through each character of a String and returns its ordering priority in a given collation.

  • The java.text.CollationKey class represents a String as governed by a specific Collator and allows relatively fast ordering comparisons.

  • The java.text.BreakIterator class implements conventions on locating breaks in lines, sentences, words, and characters in a locale-sensitive manner.

  • The java.text.StingCharacterIterator class provides for bidirectional iteration over Unicode characters and is used to search for characters within a String.

Input methods

Virtually all of the preceding discussion has involved manipulating or displaying data. However, the data must be input by some means. For an end user, that means is most often the keyboard. But what do you do when the keyboard doesn't support the characters needed for language input?

Input method is a technical term for software components that allow data input. The Java platform allows for the use of host OS input methods as well as Java-language-based input methods. If you need to implement input methods, you can use the Input Method Framework. You can find the specification, reference, and tutorials for the Input Method Client API and the Input Method Engine SPI under Input Method Framework in the Internationalization section of the JDK documentation (accessible from Resources).

3 of 11 | Previous | Next

Comments



Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Java technology
ArticleID=131609
TutorialTitle=Java internationalization basics
publish-date=04232002
author1-email=joesam@conceptgo.com
author1-email-cc=

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere).

My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).