z/OS XL C support for the double-byte character set
The number of characters in some languages such as Japanese or
Korean is larger than 256, the number of distinct values that can
be encoded in a single byte. The characters in such languages are
represented in computers by a sequence of bytes, and are called multibyte
characters. This topic explains
how the z/OS® XL C compiler supports multibyte characters.
Note: The z/OS XL C++ compiler does not have native support
for multibyte characters. The support described here is what z/OS XL
C provides; for C++, you can take advantage of this support by using
interlanguage calls to C code. Please refer to Using linkage specifications in C or C++ for more information.
The z/OS XL C compiler supports the IBM EBCDIC
encoding of multibyte characters, in which each natural language character
is uniquely represented by one to four bytes. The number of bytes
that encode a single character depends on the global shift state
information. If a stream is in initial shift state, one multibyte
character is represented by a byte or sequence of bytes that has the
following characteristics:
- It starts with the byte containing the shift-out (
0x0e
) character. - The shift-out character is followed by 2 bytes that encode the value of the character.
- These bytes may be followed by a byte containing the shift-in
(
0x0f
) character.
If the sequence of bytes ends with the shift-in character, the
state remains initial, making this sequence represent a 4-byte multibyte
character. Multibyte characters of various lengths can be normalized
by the set of z/OS XL C library functions and encoded in units
of one length. Such normalized characters are called wide characters;
in z/OS XL C they are represented by two bytes.
Conversions between multibyte format and wide character format can
be performed by string conversion functions such as
wcstombs()
, mbstowcs()
, wcsrtombs()
,
and mbsrtowcs()
,
as well by the family of the wide character I/O functions. MB_CUR_MAX
is
defined in the stdlib.h
header file. Depending on
its value, either of the following happens: - When
MB_CUR_MAX
is 1, all bytes are considered single-byte characters; shift-out and shift-in characters are treated as data as well. - When
MB_CUR_MAX
is 4:- On input, the wide character I/O functions read the multibyte character from the streams, and convert them to the wide characters.
- On output, they convert wide characters to multibyte characters and write them to the output streams.
Both binary and text
streams have orientation. Streams opened with
type=record
or type=blocked
do
not. There are three possible orientations of a stream: - Non-oriented
- A stream that has been associated with an open file before any
I/O operation is performed. The first I/O operation on a non-oriented
stream will set the orientation of the stream. The
fwide()
function may be used to set the orientation of a stream before any I/O operation is performed. You can use thesetbuf()
andsetvbuf()
functions only when I/O has not yet been performed on a stream. When you use these functions, the orientation of the stream is not affected. When you perform one of the wide character input/output operations on a non-oriented stream, the stream becomes wide-oriented. When you perform one of the byte input/output operations on a non-oriented stream, the stream becomes byte-oriented. - Wide-oriented
- A stream on which any wide character input/output functions are guaranteed to operate correctly. Conceptually, wide-oriented streams are sequences of wide characters. The external file associated with a wide-oriented stream is a sequence of multibyte characters. Using byte I/O functions on a wide-oriented stream results in undefined behavior. A stream opened for record I/O or blocked I/O cannot be wide-oriented.
- Byte-oriented
- A stream on which any byte input/output functions are guaranteed to operate properly. Using wide character I/O functions on a byte input/output stream results in undefined behavior. Byte-oriented streams have minimal support for multibyte characters.
Calls to the clearerr()
, feof()
, ferror()
, fflush()
, fgetpos()
,
or ftell()
functions do not change the orientation.
Other functions that do not change the orientation are ftello()
, fsetpos()
, fseek()
, fseeko()
, rewind()
, fldata()
,
and fileno()
. Also, the perror()
function
does not affect the orientation of the stderr stream.
Once you have established a stream's orientation, the only way
to change it is to make a successful call to the freopen()
function,
which removes a stream's orientation.
The wchar.h
header file declares the WEOF
macro
and the functions that support wide character input and output. The
macro expands to a constant expression of type wint_t
.
Certain functions return WEOF
type when the end-of-file
is reached on the stream.
Note: The behavior of the wide character I/O functions is affected
by the
LC_CTYPE
category of the current locale, and
the setting of MB_CUR_MAX
. Wide-character input and
output should be performed under the same LC_CTYPE
setting.
If you change the setting between when you read from a file and when
you write to it, or vice versa, you may get undefined behavior. If
you change it back to the original setting, however, you will get
the behavior that is documented. See the introduction of this topic for
a discussion of the effects of MB_CUR_MAX
.