There are a couple of
EBCDIC encoding considerations to deal with when trying to parse an
XML file on z/OS®. The first
involves the character set differences between EBCDIC and Unicode.
Because only a small number of Unicode characters can be represented
in EBCDIC, when an EBCDIC encoded XML document is parsed, any Unicode
character entity in the parsed document that does not have an EBCDIC
value is converted into a dash.
Note: The default for an non-representable
character is a dash. This can be overridden with a control call to
XEC_CTL_ENTS_AND_REFS.
Secondly, if the EBCDIC XML document has been created or modified
on a z/OS system, then the
line ending character is typically a NL (x'15') character. This is
commonly associated with the Unicode NEL character (x'85'). For EBCDIC
code page documents, the z/OS XML
parser will accept XML 1.0 documents that have a NL as a line termination
character, and will normalize all line-endings to EBCDIC NL (NEL).
However, because these documents are non-compliant, they may not be
accepted by parsers on other platforms. In general, EBCDIC is not
a portable encoding so IBM® does
not recommend using EBCDIC for XML documents going between platforms
or on the Internet.
Note: For XML 1.1 documents, NL is legitimate
and the z/OS XML parser is
compliant in processing it as such.