XML input document encoding
To parse an XML document using the XML PARSE
statement,
the document must be encoded in a supported encoding.
The supported encodings for a given parse operation depend on:
- The category of the data item that contains the XML document
- The setting of the
XMLPARSE
compiler option - The optional phrases that are specified in the
XML PARSE
statement
For XML documents that are contained in a national data item, the supported encoding is Unicode UTF-16 in big-endian format, CCSID 1200.
For XML documents that are contained in an alphanumeric
data item, the supported encodings if the XMLPARSE(XMLSS)
compiler
option is in effect are as follows:
- If the
RETURNING NATIONAL
phrase is specified in theXML PARSE
statement: UTF-8 or any EBCDIC or ASCII encoding that is supported by the z/OS® Unicode Services for conversion to UTF-16 - If the
RETURNING NATIONAL
phrase is not specified: UTF-8 or any of the single-byte EBCDIC CCSIDs listed in the related reference about the encoding of XML documents
For XML documents that are contained in an alphanumeric
data item, the supported CCSIDs if XMLPARSE(COMPAT)
is
in effect are those specified in the related reference about the encoding
of XML documents.
To
parse an XML document that is encoded in an unsupported code page,
first convert the document to national character data (UTF-16) by
using the NATIONAL-OF
intrinsic function. You can
convert the individual pieces of document text that are passed to
the processing procedure in special register XML-NTEXT
back
to the original code page by using the DISPLAY-OF
intrinsic
function.
XML declaration and white space:
- If an XML document begins with an XML declaration, the first angle bracket (<) in the document must be the first character in the document.
- If an XML document does not begin with an XML declaration, the first angle bracket in the document can be preceded only by white space.
White-space characters have the hexadecimal values shown in the following table.
White-space character | EBCDIC | Unicode |
---|---|---|
Space | X'40' | X'0020' |
Horizontal tabulation | X'05' | X'0009' |
Carriage return | X'0D' | X'000D' |
Line feed | X'25' | X'000A' |
New line / next line | X'15' | X'0085' |
Determining the encoding of an input XML document
The parser must know the encoding of an XML document in order to process the document correctly.
If the specified encoding is not one of the supported coded character sets, the parser signals an XML exception event before beginning the parse operation. If the actual document encoding does not match the specified encoding, the parser signals an appropriate XML exception after beginning the parse operation.
Several sources are used in determining the encoding of an XML document:
- If the
XMLPARSE(XMLSS)
option is in effect:- The data type of the data item that contains the XML document
- The
ENCODING
phrase (if used) of theXML PARSE
statement - The CCSID specified in the
CODEPAGE
compiler option
- If the
XMLPARSE(COMPAT)
option is in effect:- The data type of the data item that contains the XML document
- The actual encoding determined when the parser examines the first few bytes of the document
- The encoding declaration specified within the XML document
- The CCSID specified in the
CODEPAGE
compiler option
If XMLPARSE(XMLSS)
is in effect:
- Any encoding declaration specified within the XML document is ignored.
- For XML documents that are contained in a national data item,
the
ENCODING
phrase of theXML PARSE
statement must be omitted or must specify CCSID 1200. The CCSID specified in theCODEPAGE
compiler option is ignored. The parser signals an XML exception event if the actual document encoding is not UTF-16 in big-endian format. - For XML documents that are contained in an alphanumeric data item,
the CCSID specified in the
ENCODING
phrase overrides theCODEPAGE
compiler option. The parser raises an XML exception event at the beginning of the parse operation if the actual document encoding is not consistent with the specified CCSID.
Converting to or from national (Unicode) representation
Specifying the encoding
Parsing XML documents encoded in UTF-8
Handling XML PARSE exceptions
XMLPARSE (compiler option)
The encoding of XML documents
EBCDIC code-page-sensitive characters in XML markup