XML PARSE statement

The XML PARSE statement is the COBOL language interface to the high-speed XML parser that is part of the COBOL run time.

The XML PARSE statement parses an XML document into its individual pieces and passes each piece, one at a time, to a user-written processing procedure.

XML PARSE statements must not be specified in declarative procedures.

Format

Read syntax diagramSkip visual syntax diagramXML PARSEidentifier-1 PROCESSING PROCEDUREISprocedure-name-1 THROUGHTHRUprocedure-name-2 ONEXCEPTIONimperative-statement-1NOTONEXCEPTIONimperative-statement-2END-XML
identifier-1
identifier-1 must be an elementary data item of category national, a national group, an elementary data item of category alphanumeric, or an alphanumeric group item. identifier-1 cannot be a function-identifier. identifier-1 contains the XML document character stream.

If identifier-1 is a national group item, identifier-1 is processed as an elementary data item of category national.

If identifier-1 is of category national, its content must be encoded using Unicode UTF-16LE (CCSID 1200). identifier-1 must not contain any character entities that are represented using multiple encoding units. Use a character reference to represent any such characters, for example:
  • "񧘃" or
  • "𐠓"
The letter x must be lowercase.

CHAR(NATIVE) and alphanumeric identifier-1

If identifier-1 is alphanumeric and the CHAR(EBCDIC) compiler option is not in effect, the content of identifier-1 must be encoded using UTF-8 Unicode or a single-byte ASCII code page that is supported by ICU conversion libraries (see International Components for Unicode: Converter Explorer).

To ensure that a document in such a data item is parsed in UTF-8 rather than ASCII:
  • The code page for the runtime locale must be a UTF-8 locale, or
  • The document must include an XML encoding declaration specifying UTF-8, or
  • The document must start with a UTF-8 byte order mark.

UTF-8 documents must not contain any characters with a Unicode scalar value greater than x'FFFF'. Use a character reference for such characters.

If the XML document in such a data item does not specify an encoding declaration and does not start with a UTF-8 byte order mark, it is parsed with the code page indicated by the current runtime locale.

CHAR(EBCDIC) and alphanumeric identifier-1

If identifier-1 is alphanumeric and the CHAR(EBCDIC) compiler option is in effect, the content of identifier-1 must be encoded using a single-byte EBCDIC code page that is supported by ICU conversion libraries (see International Components for Unicode: Converter Explorer). If identifier-1 is an elementary item, the NATIVE keyword must not be specified in its data description entry.

If the XML document in such a data item does not specify an encoding declaration, the XML document is parsed with the code page specified by the EBCDIC_CODEPAGE environment variable, or if the EBCDIC_CODEPAGE environment variable is not set, the default EBCDIC code page selected for the current runtime locale, as described in Locales and code pages that are supported in the COBOL for Linux® on x86 Programming Guide.

Setting and using runtime locales and code pages

For more information about setting and using runtime locales and code pages, see Locales and code pages that are supported in the COBOL for Linux on x86 Programming Guide. The single-byte ASCII and EBCDIC code pages are those for which the column labeled Language group (the rightmost column) of the table Locales and code pages supported does not specify "Ideographic languages."

PROCESSING PROCEDURE phrase
Specifies the name of a procedure to handle the various events that the XML parser generates.
procedure-name-1, procedure-name-2
Must name a section or paragraph in the PROCEDURE DIVISION. Procedure-name-1 and procedure-name-2 must not name a procedure name in a declarative section.
procedure-name-1
Specifies the first (or only) section or paragraph in the processing procedure.
procedure-name-2
Specifies the last section or paragraph in the processing procedure.
For each XML event, the parser transfers control to the first statement of the procedure named procedure-name-1. Control is always returned from the processing procedure to the XML parser. The point from which control is returned is determined as follows:
  • If procedure-name-1 is a paragraph name and procedure-name-2 is not specified, the return is made after the execution of the last statement of the procedure-name-1 paragraph.
  • If procedure-name-1 is a section name and procedure-name-2 is not specified, the return is made after the execution of the last statement of the last paragraph in the procedure-name-1 section.
  • If procedure-name-2 is specified and it is a paragraph name, the return is made after the execution of the last statement of the procedure-name-2 paragraph.
  • If procedure-name-2 is specified and it is a section name, the return is made after the execution of the last statement of the last paragraph in the procedure-name-2 section.

The only necessary relationship between procedure-name-1 and procedure-name-2 is that they define a consecutive sequence of operations to execute, beginning at the procedure named by procedure-name-1 and ending with the execution of the procedure named by procedure-name-2.

If there are two or more logical paths to the return point, then procedure-name-2 can name a paragraph that consists of only an EXIT statement; all the paths to the return point must then lead to this paragraph.

The processing procedure consists of all the statements at which XML events are handled. The range of the processing procedure includes all statements executed by CALL, EXIT, GO TO, GOBACK, MERGE, PERFORM, and SORT statements that are in the range of the processing procedure, as well as all statements in declarative procedures that are executed as a result of the execution of statements in the range of the processing procedure.

The range of the processing procedure must not cause the execution of any GOBACK or EXIT PROGRAM statement, except to return control from a method or program to which control was passed by a CALL statement, respectively, that is executed in the range of the processing procedure.

The range of the processing procedure must not cause the execution of an XML PARSE statement, unless the XML PARSE statement is executed in a method or outermost program to which control was passed by a CALL statement that is executed in the range of the processing procedure.

A program executing on multiple threads can execute the same XML statement or different XML statements simultaneously.

The processing procedure can terminate the run unit with a STOP RUN statement.

For more details about the processing procedure, see Control flow.

ON EXCEPTION
The ON EXCEPTION phrase specifies imperative statements that are executed when the XML PARSE statement raises an exception condition.

An exception condition exists when the XML parser detects an error in processing the XML document. The parser first signals an XML exception by passing control to the processing procedure with special register XML-EVENT containing 'EXCEPTION'. The parser also provides a numeric error code in special register XML-CODE, as detailed in Handling XML PARSE exceptions in the COBOL for Linux on x86 Programming Guide.

An exception condition also exists if the processing procedure sets XML-CODE to -1 before returning to the parser for any normal XML event. In this case, the parser does not signal an EXCEPTION XML event and parsing is terminated.

If the ON EXCEPTION phrase is specified, the parser transfers control to imperative-statement-1. If the ON EXCEPTION phrase is not specified, the NOT ON EXCEPTION phrase, if any, is ignored and control is transferred to the end of the XML PARSE statement.

Special register XML-CODE contains the numeric error code for the XML exception or -1 after execution of the XML PARSE statement.

If the processing procedure handles the XML exception event and sets XML-CODE to zero before returning control to the parser, the exception condition no longer exists. If no other unhandled exceptions occur before termination of the parser, control is transferred to imperative-statement-2 of the NOT ON EXCEPTION phrase, if specified.

NOT ON EXCEPTION
The NOT ON EXCEPTION phrase specifies imperative statements that are executed when no exception condition exists at the termination of XML PARSE processing.

If an exception condition does not exist at termination of XML PARSE processing, control is transferred to imperative-statement-2 of the NOT ON EXCEPTION phrase, if specified. If the NOT ON EXCEPTION phrase is not specified, control is transferred to the end of the XML PARSE statement. The ON EXCEPTION phrase, if specified, is ignored.

Special register XML-CODE contains zero after execution of the XML PARSE statement.

END-XML phrase
This explicit scope terminator delimits the scope of XML GENERATE or XML PARSE statements. END-XML permits a conditional XML GENERATE or XML PARSE statement (that is, an XML GENERATE or XML PARSE statement that specifies the ON EXCEPTION or NOT ON EXCEPTION phrase) to be nested in another conditional statement.

The scope of a conditional XML GENERATE or XML PARSE statement can be terminated by:

  • An END-XML phrase at the same level of nesting
  • A separator period

END-XML can also be used with an XML GENERATE or XML PARSE statement that does not specify either the ON EXCEPTION or NOT ON EXCEPTION phrase.

For more information about explicit scope terminators, see Delimited scope statements.