Considerations

Because the default environment for IBM® i is primarily an EBCDIC environment, you must be aware of the situations described in this topic when you use UTF support in an application.

If a program or service program has some modules compiled with the UTF support and some modules compiled without the UTF support, care must be taken to ensure that unexpected mismatches do not occur. The wide characters and wide character strings are two bytes in size for a non-UTF module and four bytes in size for a UTF module, so sharing wide characters between the modules may not work correctly. The narrow (non-wide) characters and character strings are in job CCSID for a non-UTF module and in CCSID 1208 for a UTF module, so sharing narrow characters between the modules may not work correctly either.

Whenever a setlocale() is performed to set the locale to a different CCSID, the standard output files should be flushed to avoid buffering problems with character data containing multiple CCSIDs. Since stdout is line buffered by default, if each output line ends in a newline character, the problem will not occur. However, if this is not done, the output may not be shown as intended. The following example illustrates the problem.

#include <stdio>
#include <locale.h>

int main() {
   /* This string is in CCSID 1208 */
   printf("Hello World");

   /* Change locale to a CCSID 37 locale */
   setlocale(LC_ALL, "/QSYS.LIB/EN_US.LOCALE");
   #pragma convert(37)

   /* This string is in CCSID 37 */
   printf("Hello World\n");

   return 0;
}

In this case, the first printf() causes the CCSID 1208 string Hello World to be copied to the stdout buffer. Before the setlocale() is done, stdout should be flushed to copy that string to the screen. The second printf() causes the CCSID 37 string Hello World\n to be copied to the stdout buffer. Because of the trailing newline character, the buffer is flushed at that point and the whole buffer is copied to the screen. Because the CCSID of the current locale is 37 and the screen can handle CCSID 37 without problems, the whole buffer is copied without conversion. The CCSID 1208 characters are displayed as unreadable characters. If a flush had been done, the CCSID 1208 characters would have been converted to CCSID 37 and would have been displayed correctly.

Nearly all of the runtime functions have been modified to support UTF, but there are a handful of them that have not. Functions and structures that deal with exception handling, such as the _GetExcData() function, the _EXCP_MSGID variable, and the exception handler structure _INTRPT_Hndlr_Parms_T are provided by the operating system, not the runtime. They are strictly EBCDIC. The getenv() and putenv() functions handle only EBCDIC. The QXXCHGDA() and QXXRTVDA() functions handle only EBCDIC. The argv and envp parameters are also EBCDIC only.

Some of the record I/O functions (that is, functions beginning with _R) do not completely support UTF. The functions that do not support UTF are _Rformat(), _Rcommit(), _Racquire(), _Rrelease(), _Rpgmdev(), _Rindara(), and _Rdevatr(). They are available when compiling with the UTF option, but they accept and generate only EBCDIC. In addition, any character data within the structures returned by the _R functions will be in EBCDIC rather than UTF.

Other operating system functions have not been modified to support UTF. For example, the integrated file system functions, such as open(), still accept the job CCSID. Other operating system APIs still accept the job CCSID. For UTF applications, the characters and character strings provided to these functions need to be converted to the job CCSID using QTQCVRT, iconv(), #pragma convert, or some other method.