Strings in ILE native methods
Many Java™ Native Interface (JNI) functions
accept C language-style strings as parameters. For example, the FindClass() JNI
function accepts a string parameter that specifies the fully-qualified
name of a class file. If the class file is found, it is loaded by FindClass(),
and a reference to it is returned to the caller of FindClass().
All JNI functions expect their string parameters to be encoded in UTF-8. For details on UTF-8, you can refer to the JNI Specification, but in most cases it is enough to observe that 7-bit American Standard Code for Information Interchange (ASCII) characters are equivalent to their UTF-8 representation. 7-bit ASCII characters are actually 8-bit characters but their first bit is always 0. So, most ASCII C strings are actually already in UTF-8.
The integrated language environment (ILE) C compiler on the server operates in extended binary-coded decimal interchange code (EBCDIC) by default, so strings passed to JNI functions need to be converted to UTF-8. There are two ways to do this. You can use literal strings, or you can use dynamic strings. Literal strings are strings whose value is known when the source code is compiled. Dynamic strings are strings whose value is not known at compile time, but is actually computed at run time.
Literal strings
If the
string can be represented in ASCII, as most are, then the string can
be bracketed by pragma statements that change the
current codepage of the compiler. Then, the compiler stores the string
internally in the UTF-8 form that is required by the JNI. If the string
cannot be represented in ASCII, it is easier to treat the original
extended binary-coded decimal interchange code (EBCDIC) string as
a dynamic string, and process it using iconv() before
passing it to the JNI.
For example, to find the class named java/lang/String,
the code looks like this:
#pragma convert(819)
myClass = (*env)->FindClass(env,"java/lang/String");
#pragma convert(0)The first pragma, with the number 819, informs the compiler to store all subsequent double-quoted strings (literal strings) in ASCII. The second pragma, with the number 0, tells the compiler to revert to the default code page of the compiler for double-quoted strings, which is usually the EBCDIC code page 37. So, by bracketing this call with these pragmas, we satisfy the JNI requirement that string parameters are encoded in UTF-8.
Caution: Be careful with text substitutions. For example, if your code looks like this:
#pragma convert(819)
#define MyString "java/lang/String"
#pragma convert(0)
myClass = (*env)->FindClass(env,MyString);Then,
the resulting string is EBCDIC, because the value of MyString is
substituted into the FindClass() call during compilation.
At the time of this substitution, the pragma, number 819, is not in
effect. Thus, literal strings are not stored in ASCII.
Converting dynamic strings to and from EBCDIC, Unicode, and UTF-8
To manipulate string
variables that are computed at run time, it may be necessary to convert
strings to and from EBCDIC, Unicode, and UTF-8. Conversions can be
done using the iconv() API. In Example
3 of the using the Java Native Interface for native
methods examples, the routine creates, uses, and then destroys
the iconv() conversion descriptor. This scheme avoids
the problems with multithreaded use of an iconv_t descriptor,
but for performance sensitive code it is better to create a conversion
descriptor in static storage, and moderate multiple access to it using
a mutual exclusion (mutex) or other synchronization facility.