Some simplifying rules

It is possible to write a slow, multilingual application program if the programmer is unaware of some constraints on the design of multibyte character sets that allow many programs to run efficiently in a multibyte locale with little use of internationalization functions.

For example:

  • In all code sets supported by IBM, the character codes 0x00 through 0x3F are unique and encode the ASCII standard characters. Being unique means that these bit combinations never appear as one of the bytes of a multibyte character. Because the null character is part of this set, the strlen(), strcpy(), and strcat() functions work on multibyte as well as single-byte strings. The programmer must remember that the value returned by strlen() is the number of bytes in the string, not the number of characters.
  • Similarly, the standard string function strchr(foostr, '/') works correctly in all locales, because the / (slash) is part of the unique code-point range. In fact, most of the standard delimiters are in the 0x00 to 0x3F range, so most parsing can be accomplished without recourse to internationalization functions or translation to wchar_t form.
  • Comparisons between strings fall into two classes: equal and unequal. Use the standard strcmp() function to perform comparisons. When you write
    if (strcmp(foostr,"a rose") == 0)

    you are not looking for "a rose" by any other name; you are looking for that set of bits only. If foostr contains "a rosE" no match is found.

  • Unequal comparisons occur when you are attempting to arrange strings in the locale-defined collation sequence. In that case, you would use
    if (strcoll(foostr,barstr) > 0)

    and pay the performance cost of obtaining the collation information about each character.

  • When a program is executed, it always starts in the C locale. If it will use one or more internationalization functions, including accessing message catalogs, it must execute:
    setlocale(LC_ALL, "");

    to switch to the locale of its parent process before calling any internationalization function.