Skip to main content

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

All information submitted is secure.

  • Close [x]

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

  • Close [x]

Java internationalization basics

Joe Sam Shirah (joesam@conceptgo.com), Principal and developer, conceptGO
Joe Sam Shirah is a principal and developer at conceptGO, which provides remote consulting and software development services, as well as products, with specialties in JDBC, I18N, the AS/400, RPG, finance, inventory, and logistics. Joe Sam was presented with the Java Community Award at JavaOne, 1998, and is the author of the JDBC 2.0 Fundamentals short course at the Java Developer Connection. He is the moderator of the developerWorks "Java filter" discussion forum and manager for jGuru's JDBC, I18N, and Java400 FAQs. Joe Sam has a B.B.A. in Economics and a Master's degree in International Management.

Summary:  This tutorial introduces you to the Java programming language's support for multilingual and multicountry environments. The tutorial begins with a general discussion of internationalization principles and concepts, and then moves on to an overview of the specific areas of Java internationalization support. The last few sections provide a more hands-on discussion of the areas basic to any internationalized Java application: Unicode and Java characters; locales and resource bundles; and formatting dates, numbers, and currencies.

Date:  23 Apr 2002
Level:  Introductory PDF:  A4 and Letter (202 KB | 64 pages)Get Adobe® Reader®

Activity:  21606 views
Comments:  

Unicode and Java characters

Java characters and the char datatype

One of the best-known complaints of Java programmers is "I only see question marks (or blocks) for my program output. How did my data get corrupted?" In general you, as a Java developer, should understand what is actually going on and the reasons behind this seeming problem, but this knowledge is especially important when dealing with internationalization issues.

The Java Language Specification defines char as a primitive, numeric, integral type. In addition, char is the only unsigned numeric type, which allows for some interesting (or nasty, depending on your view) tricks. chars are special in another way as well, because their values are mapped to glyphs from a character map or a font when sent to output devices like displays or printers. At its base, however, char is a numeric type and supports all integer operations. Unicode support noted that a char could be set using a letter or with the Unicode escape. Because char is a numeric, you can also use octal, decimal, or hex notation or even flip bits for assignment.

Given that background and assuming no program bugs, the answer to the question above is that the character map or font just doesn't support the character and a question mark or block is substituted for display. The value of the char itself is still valid. However, in that case you can't verify the data visually; you have to check the numerical value. The following example displays this behavior.

Japanese ideograph Go = 5 This image shows the Japanese ideograph for "Go" or 5, represented in Unicode as '\u4E94'. The character causes the question mark and block display in the charExample program below:

import javax.swing.*;

public class charExample
{
  public static void main( String[] args )
  {
    boolean bFirst = true;    
    char aChar[] = {
                     'A',     // character
                      65,     // decimal
                      0x41,   // hex
                      0101,   // octal
                     '\u0041' // Unicode escape
                   };

    char myChar = 256;
    
    for( int i = 0; i < aChar.length; i++ )
    {
      System.out.print( aChar[i]++ + " " );
      if( i == (aChar.length - 1) )
      {
        System.out.println( "\n---------" );
        if( bFirst )
        {
          i = -1;
          bFirst = !bFirst;          
        }
      }
    } // end for
    // the result of adding two chars is an int
    System.out.println( "aChar[0] + aChar[1] equals: " + 
                        (aChar[0] + aChar[1]) );
    System.out.println( "myChar at 256: " + myChar );
    System.out.println( "myChar at 20116 or \\u4E94: " + 
                       ( myChar = 20116 ) );
    // show integer value of the char
    System.out.println( "myChar numeric value: " + 
                    (int)myChar );

    JFrame jf = new JFrame();
      JOptionPane.showMessageDialog( jf,
        "myChar at 20116 or \\u4E94: " + 
        ( myChar = 20116 ) + 
        "\nmyChar numeric value: " + 
        (int)myChar, 
        "charExample", JOptionPane.ERROR_MESSAGE);

    jf.dispose();
    System.exit(0);

  } // end main
  
}  // End class charExample



First, the program initializes a char array with the letter 'A', using various representations, and a char variable is set to 256 ('\u0100'). The program prints its values twice in a loop. Each element is incremented after printing (a char is numeric, remember?). Next, the first two elements are added together, and the result (an int) is printed. Then, the char variable is printed, first with its initial value, then with a value of 20116 or '\u4E94', which is the Japanese ideogram "Go" for 5. These two values print as question marks on the display, as expected on Windows NT using code page cp1252. Depending on the code page for your system, the display may be slightly different. To check the value, the variable is then printed as an int. Last, a JOptionPane displays the value, showing a block for the unsupported char '\u4E94'.

This is the output from charExample:

A A A A A
---------
B B B B B
---------
aChar[0] + aChar[1] equals: 134
myChar at 256: ?
myChar at 20116 or \u4E94: ?
myChar numeric value: 20116



The JOptionPane display:


Fonts, font properties, and the Lucida font

The Java platform recognizes both logical and physical fonts.

Logical fonts are those that are automatically mapped to host system fonts. These are the familiar Serif, Sans-serif, Monospaced, Dialog, and DialogInput fonts. There are also four logical font styles: plain, bold, italic, and bolditalic. The mapping from host to logical fonts is done with a font.properties file, located in the JRE/lib directory. While specifics vary from system to system, the default font.properties file is usually set for English speakers, although there is a localized Japanese version of the JDK available. Additional font.properties files are shipped; JDK 1.3.1 for Windows includes files for Arabic, Hebrew, Japanese, Korean, Russian, Thai, and several versions for Chinese. The search for an appropriate font.properties is similar (but not identical) to the method used for ResourceBundles, as is the naming convention. If a language-specific font.properties file matches your system's locale and the expected fonts (normally shipped with that version of the OS) are installed, automatic mapping is done for that language. Otherwise, the default, usually English, file mapping is used.

Automatic mapping will also occur if you install the appropriate font and pass the corresponding language and country code when invoking a Java application. This behavior is very useful for development if the desired font.properties file exists. You can also effectively make that language/font the default by copying the initial default font.properties file to something else and renaming the specific file to "font.properties". While easy enough for developers, that's obviously not something end users should have to do.

Matters are completely different and more difficult if you must customize or create a new font.properties file yourself. Instructions for dealing with font.properties files are available in Font Properties in the Internationalization section of the JDK documentation.

Physical fonts are the normal fonts we use all the time. Fonts based on ASCII and ISO 8859-1 are not a problem. Once we get outside that range, however, the host platform obviously must understand them, and they must be Unicode-encoded to work in your Java programs. These fonts are not as difficult to find as once was the case. The Windows MS Mincho TrueType font (mostly Japanese), for example, is Unicode-encoded and may be used immediately in the standard manner. When an appropriate physical font is loaded on the system, you can let users select the font they want and save their preferences or set the font as a standard for an entire package without getting into font.properties files.

The Java 2 SDK also provides three physical font families: Lucida Sans, Lucida Bright, and Lucida Sans Typewriter. Each family contains four fonts -- for plain, italic, bold, and bolditalic styles -- for a total of 12 fonts. While information is scarce on the exact capabilities of these fonts, the Lucida Sans font handles most European and mid-Eastern languages. The Asian languages are not included. Because this font comes with the JDK, all the graphical application examples in the tutorial use the Lucida Sans font. For more information, see Physical Fonts in the Internationalization section of the JDK documentation (accessible from Resources).

4 of 11 | Previous | Next

Comments



Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Java technology
ArticleID=131609
TutorialTitle=Java internationalization basics
publish-date=04232002
author1-email=joesam@conceptgo.com
author1-email-cc=

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere).

My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).