Splitting records

When building the parsed data stream in the output buffer, the z/OS XML parser will always ensure that all records are fully formed. Since some records represent items from the document that may be very long (for example, CDATA, white space, or comments), certain types of records are deemed to be splittable. In these cases, the z/OS XML parser will always ensure that the header for the split record is complete, but the value(s) in the record will only contain a part of the item being parsed. A flag in the record header will be set to indicate that the record is continued.

Note: In fragment parsing mode, the flag is set to 'OFF' on a continued character data record when CDATA is outside an element tag (start and end tag). However, if CDATA is inside an element that splits, the continuation flag will still be 'ON'.

Split records may span several output buffers if they are very long, or if the output buffers are relatively short.

Records that represent items of fixed length or that contain multiple values are mostly deemed to be non-splittable. If there is no room in the current output buffer to hold them, the entire record will be placed in the next output buffer. These records represent things like start element tags, attribute names, namespace declarations, or end element tags.

Note: The one exception to this rule are processing instructions (PIs). Because the text associated with PIs can be arbitrarily long, they are permitted to split.

If the z/OS XML parser determines that an output buffer is spanned, and requests another buffer to continue processing, the caller needs to return a new buffer large enough to contain a minimum set of complete data. If the item that needs to be placed at the beginning of this new buffer is a non-splittable record that doesn't fit, the z/OS XML parser will return with a return code of XRC_FAILURE, and a reason code of XRSN_BUFFER_OUTBUF_SMALL.

The z/OS XML parser generally does not split records unless there is a need to - for example, to fit into a given output buffer. However, the decision to split a record depends on many factors. There are instances where the z/OS XML parser will split records of the same type within the same buffer, and this is normal. This is particularly true for XDBX streams, where the z/OS XML parser generates records based on the stream of XDBX tags presented by the builder of the stream. One should not expect, for instance, that the stream of z/OS XML records generated for a given text document will have records split in the same way as for an XDBX stream representing the same document.

The following table shows which record types can be split:

Table 1. Splittable record types
Record type	Splittable?
GXLHXEC_TOK_ATTR_NAME	No
GXLHXEC_TOK_ATTR_VALUE	Yes
GXLHXEC_TOK_AUX_INFO	No
GXLHXEC_TOK_BUFFER_INFO	No
GXLHXEC_TOK_COMMENT	Yes
GXLHXEC_TOK_CHAR_DATA	Yes
GXLHXEC_TOK_END_CDATA	No
GXLHXEC_TOK_END_ELEM	No
GXLHXEC_TOK_ERROR	No
GXLHXEC_TOK_DTD_DATA	No
GXLHXEC_TOK_NS_DECL	No
GXLHXEC_TOK_PI	Yes
GXLHXEC_TOK_ROOT_ELEMENT	No
GXLHXEC_TOK_SCHEMA_LOCATION	No
GXLHXEC_TOK_START_CDATA	No
GXLHXEC_TOK_START_ELEM	No
GXLHXEC_TOK_UNRESOLVED_REF	No
GXLHXEC_TOK_WHITESPACE	Yes
GXLHXEC_TOK_XML_DECL	No

The above token names are for the C/C++ callers. Assembler callers use token names without the "GXLH" prefix.