XML PARSE statement

The XML PARSE statement is the COBOL language interface to either of two high-speed XML parsers, depending on the setting of the XMLPARSE compiler option.

The two high-speed XML parsers are:

  • The z/OS® XML System Services parser, for enhanced parsing capabilities. This parser is selected by the XMLPARSE(XMLSS) compiler option.
  • The XML parser that is provided in the COBOL run time, for compatibility with Enterprise COBOL for z/OS Version 3. The compatible parser is selected by the XMLPARSE(COMPAT) compiler option.

The XML PARSE statement parses an XML document into its individual pieces and passes each piece, one at a time, to a user-written processing procedure.

Format

Read syntax diagramSkip visual syntax diagramXML PARSEidentifier-1WITHENCODINGcodepageRETURNING NATIONALVALIDATINGWITHidentifier-2FILExml-schema-name-1 PROCESSING PROCEDUREISprocedure-name-1 THROUGHTHRUprocedure-name-2 ONEXCEPTIONimperative-statement-1NOTONEXCEPTIONimperative-statement-2END-XML
identifier-1
identifier-1 must be an elementary data item of category national, a national group, an elementary data item of category alphanumeric, or an alphanumeric group item. identifier-1 cannot be a function-identifier. identifier-1 contains the XML document character stream.

If identifier-1 is a national group item, identifier-1 is processed as an elementary data item of category national.

If identifier-1 is of category national, its content must be encoded using Unicode UTF-16BE (CCSID 1200). If the XMLPARSE(COMPAT) compiler option is in effect, identifier-1 must not contain any character entities that are represented using multiple encoding units. Use a character reference to represent any such characters, for example:
  • "񧘃" or
  • "𐠓"
The letter x must be lowercase.

Start of changeidentifier-1 must not be a dynamic-length group item or a dynamic-length elementary item.End of change

If identifier-1 is of category alphanumeric, its content must be encoded using one of the character sets listed in Coded character sets for XML documents in the Enterprise COBOL Programming Guide. If the XMLPARSE(COMPAT) compiler option is in effect, and identifier-1 is alphanumeric and contains an XML document that does not specify an encoding declaration, the XML document is parsed with the code page specified by the CODEPAGE compiler option.

If the XMLPARSE(XMLSS) compiler option is in effect, the XML document is parsed with the code page specified in the ENCODING phrase; if the ENCODING phrase is not used, the document is parsed with the code page specified by the CODEPAGE compiler option. Any encoding declaration in the XML document is ignored.

RETURNING NATIONAL phrase
The RETURNING NATIONAL phrase can be specified only when the XMLPARSE(XMLSS) compiler option is in effect.

When identifier-1 references a data item of category alphanumeric and the RETURNING NATIONAL phrase is specified, XML document fragments are automatically converted to Unicode UTF-16 representation and returned to the processing procedure in the national special registers XML-NTEXT, XML-NNAMESPACE, and XML-NNAMESPACE-PREFIX.

When the RETURNING NATIONAL phrase is not specified and identifier-1 references a data item of category alphanumeric, the XML document fragments are returned to the processing procedure in the alphanumeric special registers XML-TEXT, XML-NAMESPACE, and XML-NAMESPACE-PREFIX except that when XMLPARSE(COMPAT) is in effect, text for the ATTRIBUTE-NATIONAL-CHARACTER and CONTENT-NATIONAL-CHARACTER XML events is always returned in special register XML-NTEXT.

When identifier-1 references a national data item, XML document fragments are always returned in Unicode UTF-16 representation in the national special registers XML-NTEXT, XML-NNAMESPACE, and XML-NNAMESPACE-PREFIX.

VALIDATING phrase
The VALIDATING phrase specifies that the parser should validate the XML document against an XML schema while parsing it. In Enterprise COBOL, the schema used for XML validation is in a preprocessed format known as Optimized Schema Representation or OSR. The VALIDATING phrase can be specified only when the XMLPARSE(XMLSS) compiler option is in effect.

See Parsing XML documents with validation in the Enterprise COBOL Programming Guide for details.

Start of changeidentifier-2 must not be a dynamic-length group item or a dynamic-length elementary item.End of change

If the FILE keyword is not specified, identifier-2 must reference a data item that contains the optimized XML schema. identifier-2 must be of category alphanumeric and cannot be a function-identifier.

If the FILE keyword is specified, xml-schema-name-1 identifies an existing z/OS UNIX file or MVS™ data set that contains the optimized XML schema. xml-schema-name-1 must be associated with the external file name of the schema by using the XML-SCHEMA clause. For more information about the XML-SCHEMA clause, see SPECIAL-NAMES paragraph.

Restriction: XML validation using the FILE keyword is not supported under CICS®.

During parsing with validation, normal XML events are returned as for nonvalidating parsing until an exception occurs due to a validation error or other error in the document.

When an XML document is not valid, the parser signals an XML exception and passes control to the processing procedure with special register XML-EVENT containing 'EXCEPTION' and special-register XML-CODE containing return code 24 in the high-order halfword and a reason code in the low-order halfword.

For information about the return code and reason code for exceptions that might occur when parsing XML documents with validation, see XML PARSE exceptions with XMLPARSE(XMLSS) in effect in the Enterprise COBOL Programming Guide.

ENCODING phrase
The ENCODING phrase can be specified only when the XMLPARSE(XMLSS) compiler option is in effect.

The ENCODING phrase specifies an encoding that is assumed for the source XML document in identifier-1. codepage must be an unsigned integer data item or an unsigned integer literal that represents a valid coded character set identifier (CCSID). The ENCODING phrase specification overrides the encoding specified by the CODEPAGE compiler option. The encoding specified in any XML declaration is always ignored.

If identifier-1 references a data item of category national, codepage must specify CCSID 1200, for Unicode UTF-16.

If identifier-1 references a data item of category alphanumeric, codepage must specify CCSID 1208 for UTF-8 or a CCSID for a supported EBCDIC or ASCII codepage. See Coded character sets for XML documents in the Enterprise COBOL Programming Guide for details.

PROCESSING PROCEDURE phrase
Specifies the name of a procedure to handle the various events that the XML parser generates.
procedure-name-1, procedure-name-2
Must name a section or paragraph in the PROCEDURE DIVISION. When both procedure-name-1 and procedure-name-2 are specified, if either is a procedure name in a declarative procedure, both must be procedure names in the same declarative procedure.
procedure-name-1
Specifies the first (or only) section or paragraph in the processing procedure.
procedure-name-2
Specifies the last section or paragraph in the processing procedure.
For each XML event, the parser transfers control to the first statement of the procedure named procedure-name-1. Control is always returned from the processing procedure to the XML parser. The point from which control is returned is determined as follows:
  • If procedure-name-1 is a paragraph name and procedure-name-2 is not specified, the return is made after the execution of the last statement of the procedure-name-1 paragraph.
  • If procedure-name-1 is a section name and procedure-name-2 is not specified, the return is made after the execution of the last statement of the last paragraph in the procedure-name-1 section.
  • If procedure-name-2 is specified and it is a paragraph name, the return is made after the execution of the last statement of the procedure-name-2 paragraph.
  • If procedure-name-2 is specified and it is a section name, the return is made after the execution of the last statement of the last paragraph in the procedure-name-2 section.

The only necessary relationship between procedure-name-1 and procedure-name-2 is that they define a consecutive sequence of operations to execute, beginning at the procedure named by procedure-name-1 and ending with the execution of the procedure named by procedure-name-2.

If there are two or more logical paths to the return point, then procedure-name-2 can name a paragraph that consists of only an EXIT statement; all the paths to the return point must then lead to this paragraph.

The processing procedure consists of all the statements at which XML events are handled. The range of the processing procedure includes all statements executed by CALL, EXIT, GO TO, GOBACK, INVOKE, MERGE, PERFORM, and SORT statements that are in the range of the processing procedure, as well as all statements in declarative procedures that are executed as a result of the execution of statements in the range of the processing procedure.

The range of the processing procedure must not cause the execution of any GOBACK or EXIT PROGRAM statement, except to return control from a method or program to which control was passed by an INVOKE or CALL statement, respectively, that is executed in the range of the processing procedure.

The range of the processing procedure must not cause the execution of an XML PARSE statement, unless the XML PARSE statement is executed in a method or outermost program to which control was passed by an INVOKE or CALL statement that is executed in the range of the processing procedure.

A program executing on multiple threads can execute the same XML statement or different XML statements simultaneously.

The processing procedure can terminate the run unit with a STOP RUN statement.

For more details about the processing procedure, see Control flow.

ON EXCEPTION
The ON EXCEPTION phrase specifies imperative statements that are executed when the XML PARSE statement raises an exception condition.

An exception condition exists when the XML parser detects an error in processing the XML document. The parser first signals an XML exception by passing control to the processing procedure with special register XML-EVENT containing 'EXCEPTION'. The parser also provides a numeric error code in special register XML-CODE, as detailed in Handling XML PARSE exceptions in the Enterprise COBOL Programming Guide.

An exception condition also exists if the processing procedure sets XML-CODE to -1 before returning to the parser for any normal XML event. In this case, the parser does not signal an EXCEPTION XML event and parsing is terminated.

If the ON EXCEPTION phrase is specified, the parser transfers control to imperative-statement-1. If the ON EXCEPTION phrase is not specified, the NOT ON EXCEPTION phrase, if any, is ignored and control is transferred to the end of the XML PARSE statement.

Special register XML-CODE contains the numeric error code for the XML exception or -1 after execution of the XML PARSE statement.

If the processing procedure handles the XML exception event and sets XML-CODE to zero before returning control to the parser, the exception condition no longer exists. If no other unhandled exceptions occur before termination of the parser, control is transferred to imperative-statement-2 of the NOT ON EXCEPTION phrase, if specified.

NOT ON EXCEPTION
The NOT ON EXCEPTION phrase specifies imperative statements that are executed when no exception condition exists at the termination of XML PARSE processing.

If an exception condition does not exist at termination of XML PARSE processing, control is transferred to imperative-statement-2 of the NOT ON EXCEPTION phrase, if specified. If the NOT ON EXCEPTION phrase is not specified, control is transferred to the end of the XML PARSE statement. The ON EXCEPTION phrase, if specified, is ignored.

Special register XML-CODE contains zero after execution of the XML PARSE statement.

END-XML phrase
This explicit scope terminator delimits the scope of XML GENERATE or XML PARSE statements. END-XML permits a conditional XML GENERATE or XML PARSE statement (that is, an XML GENERATE or XML PARSE statement that specifies the ON EXCEPTION or NOT ON EXCEPTION phrase) to be nested in another conditional statement.

The scope of a conditional XML GENERATE or XML PARSE statement can be terminated by:

  • An END-XML phrase at the same level of nesting
  • A separator period

END-XML can also be used with an XML GENERATE or XML PARSE statement that does not specify either the ON EXCEPTION or NOT ON EXCEPTION phrase.

For more information about explicit scope terminators, see Delimited scope statements.