Strings in ILE native methods

Many Java™ Native Interface (JNI) functions accept C language-style strings as parameters. For example, the FindClass() JNI function accepts a string parameter that specifies the fully-qualified name of a class file. If the class file is found, it is loaded by FindClass(), and a reference to it is returned to the caller of FindClass().

All JNI functions expect their string parameters to be encoded in UTF-8. For details on UTF-8, you can refer to the JNI Specification, but in most cases it is enough to observe that 7-bit American Standard Code for Information Interchange (ASCII) characters are equivalent to their UTF-8 representation. 7-bit ASCII characters are actually 8-bit characters but their first bit is always 0. So, most ASCII C strings are actually already in UTF-8.

The integrated language environment (ILE) C compiler on the server operates in extended binary-coded decimal interchange code (EBCDIC) by default, so strings passed to JNI functions need to be converted to UTF-8. There are two ways to do this. You can use literal strings, or you can use dynamic strings. Literal strings are strings whose value is known when the source code is compiled. Dynamic strings are strings whose value is not known at compile time, but is actually computed at run time.

Literal strings

If the string can be represented in ASCII, as most are, then the string can be bracketed by pragma statements that change the current codepage of the compiler. Then, the compiler stores the string internally in the UTF-8 form that is required by the JNI. If the string cannot be represented in ASCII, it is easier to treat the original extended binary-coded decimal interchange code (EBCDIC) string as a dynamic string, and process it using iconv() before passing it to the JNI.

For example, to find the class named java/lang/String, the code looks like this:

    #pragma convert(819)
    myClass = (*env)->FindClass(env,"java/lang/String");
    #pragma convert(0)

The first pragma, with the number 819, informs the compiler to store all subsequent double-quoted strings (literal strings) in ASCII. The second pragma, with the number 0, tells the compiler to revert to the default code page of the compiler for double-quoted strings, which is usually the EBCDIC code page 37. So, by bracketing this call with these pragmas, we satisfy the JNI requirement that string parameters are encoded in UTF-8.

Caution: Be careful with text substitutions. For example, if your code looks like this:

    #pragma convert(819)
    #define MyString "java/lang/String"
    #pragma convert(0)
    myClass = (*env)->FindClass(env,MyString);

Then, the resulting string is EBCDIC, because the value of MyString is substituted into the FindClass() call during compilation. At the time of this substitution, the pragma, number 819, is not in effect. Thus, literal strings are not stored in ASCII.

Converting dynamic strings to and from EBCDIC, Unicode, and UTF-8

To manipulate string variables that are computed at run time, it may be necessary to convert strings to and from EBCDIC, Unicode, and UTF-8. Conversions can be done using the iconv() API. In Example 3 of the using the Java Native Interface for native methods examples, the routine creates, uses, and then destroys the iconv() conversion descriptor. This scheme avoids the problems with multithreaded use of an iconv_t descriptor, but for performance sensitive code it is better to create a conversion descriptor in static storage, and moderate multiple access to it using a mutual exclusion (mutex) or other synchronization facility.