Parsing XML documents

Before the z/OS XML parser can perform a parse on an XML document, it must first establish a context in which it can operate. This is accomplished when the caller invokes the initialization routine and passes in a piece of memory where the z/OS XML parser establishes a Parse Instance Memory Area (PIMA). This is the area where the z/OS XML parser creates a base for the internal data structures it uses to complete the parse process.

Rule: A particular PIMA must only be used during the parse of a single XML document at a time. Only after the parse is complete and the parse instance is reset can a PIMA be reused for the parse of another document.

In addition to control information, the PIMA is used as a memory area to store temporary data required during the parse. When the z/OS XML parser needs more storage than was provided in the PIMA, additional storage is allocated. Because allocating additional storage is an expensive operation, the PIMA should be initially allocated with sufficient storage to handle the expected document size, in order to optimize memory allocation requests.

Rule: For the non-validating z/OS® XML parser, the minimum size for the PIMA is 128 kilobytes. For the validating z/OS XML parser, the minimum size for the PIMA is 768 kilobytes.

Everything that the z/OS XML parser needs to complete the parse of a document is kept in the PIMA, along with any associated memory extensions that the parser may allocate during the parse process. The caller also must provide input and output buffers on each call to the parse service (gxlpParse for C/C++ callers, GXL1PRS (GXL4PRS) for assember callers). In the event that either the text in the input buffer is consumed or the parsed data stream fills the output buffer, the z/OS XML parser will return XRC_WARNING, along with a reason code indicating which buffer (possibly both) needs the caller's attention. It also indicates the current location and number of bytes remaining in each buffer by updating the buffer_addr and buffer_bytes_left parameters passed in on the parse request (for C/C++ callers, see the description of gxlpParse — parse a buffer of XML text; for assembler callers, see the description of GXL1PRS (GXL4PRS) — parse a buffer of XML text). This process is referred to as buffer spanning. For more information, see Spanning buffers.

If the entire document has been processed when the z/OS XML parser returns to the caller, the parse is complete and the caller proceeds accordingly. If the caller requires another document to be parsed, it has the option of terminating the current parse instance by calling the termination service (gxlpTerminate for C/C++ callers, GXL1TRM(GXL4TRM) for assembler callers). This will free up any resources that the z/OS XML parser may have acquired and resets the data structures in the PIMA. If the caller needs to parse another document, it will have to call the initialization service again to either completely re-initialize an existing PIMA that has been terminated or initialize a new PIMA from scratch.

Another option is to use the finish/reset function of the z/OS XML parser control service (gxlpControl for C/C++ callers, GXL1CTL (GXL4CTL) for assembler callers) to reset the PIMA so that it can be reused. This is a lighter-weight operation that preserves certain information that can be reused across parsing operations for multiple documents. This potentially improves the performance for subsequent parses, since this information can be reused instead of rebuilt from scratch. Reusing the PIMA in this way is particularly beneficial to callers that need to handle multiple documents that use the same symbols (for example, namespaces and local names for elements and attributes). The PIMA can only be reused in this way when the XML documents are in the same encoding.

Restriction: The following restrictions apply when conducting a validating parse:

When parsing in non-Unicode encodings, non-representable character entities are replaced with the "-" character prior to validation. See Non-representable characters for more information on non-representable character entities.
There is a maximum of 64 KB non-wildcard attributes for a single element, and 64 KB elements in an All group.