EBCDIC encoding considerations

There are a couple of EBCDIC encoding considerations to deal with when trying to parse an XML file on z/OS®. The first involves the character set differences between EBCDIC and Unicode. Because only a small number of Unicode characters can be represented in EBCDIC, when an EBCDIC encoded XML document is parsed, any Unicode character entity in the parsed document that does not have an EBCDIC value is converted into a dash.

Note: The default for an non-representable character is a dash. This can be overridden with a control call to XEC_CTL_ENTS_AND_REFS.

Secondly, if the EBCDIC XML document has been created or modified on a z/OS system, then the line ending character is typically a NL (x'15') character. This is commonly associated with the Unicode NEL character (x'85'). For EBCDIC code page documents, the z/OS XML parser will accept XML 1.0 documents that have a NL as a line termination character, and will normalize all line-endings to EBCDIC NL (NEL). However, because these documents are non-compliant, they may not be accepted by parsers on other platforms. In general, EBCDIC is not a portable encoding so IBM® does not recommend using EBCDIC for XML documents going between platforms or on the Internet.

Note: For XML 1.1 documents, NL is legitimate and the z/OS XML parser is compliant in processing it as such.