Processing XML Documents
You can process XML documents from your RPG program by using the XML-INTO or XML-SAX statements. These statements are the RPG language interface to the high-speed XML parser. The parser currently being used by RPG is a non-validating parser, although it checks XML documents for many well-formedness errors. See the "XML Conformance" section in the "XML Reference Material" appendix of the ILE COBOL Programmer's Guide for more information on the XML parser.
The XML documents can be in a character or UCS-2 RPG variable, or they can be in an Integrated File System file.
The parser is a SAX parser. A SAX parser operates by reading the XML document character by character. Whenever it has located a fragment of the XML document, such as an element name, or an attribute value, it calls back to a handling procedure provided by the caller of the parser, passing it information about the fragment of XML that it has found. For example, when the parser has found an XML element name, it calls the handling procedure indicating that the "event" is a "start element" event and passing it the name of the element.
The handling procedure processes the information and returns to the parser which continues to read the XML document until it has enough information to call the handling procedure with another event. This process repeats until the entire XML document has been parsed, or until the handling procedure indicates that parsing should end.
<email type="text">
<sendto>JohnDoe@there</sendto>
</email>
Parsed text | Event | Event data |
---|---|---|
start document | ||
start element | "email" | |
type= | attribute name | "type" |
"text" | attribute value | "text" |
>whitespace | element content | the whitespace |
<sendto> | start element | "sendto" |
JohnDoe@there | element content | "JohnDoe@there" |
</sendto> | end element | "sendto" |
whitespace | element content | the whitespace |
</email> | end element | "email" |
end document |
The XML-SAX and XML-INTO operation codes allow you to use the XML parser.
- The XML-SAX operation allows you to specify an event handling procedure
to handle every event that the parser generates. This is useful if you do
not know in advance what an XML document may contain.
For example, if you know that an XML document will contain an XML attribute with the name type, and you want to know the value of this attribute, your handling procedure can wait for the "attribute name" event to have a value of "type". Then the next time the handler is called, it should be an "attribute value" event, with the required data ("text" in the example above).
- The XML-INTO operation allows you to read the contents of an XML document
directly into an RPG variable. This is useful if you know the format of the
XML document and you know that the names of the XML elements in the document
will be the same as the names you have given to your RPG variables.
For example, if you know that the XML document will always have the form of the document above, you can define an RPG data structure with the name "email", and with subfields "type" and "sendto". Then you can use the XML-INTO operation to read the XML document directly into the data structure. When the operation is complete, the "type" subfield would have the value "text" and the "sendto" subfield would have the value "JohnDoe@there".
- The XML-INTO operation also allows you to obtain the values of an unknown number of repeated XML elements. You provide a handling procedure that receives the values of a fixed number of elements each time the handling procedure is called. This is useful if you know that the XML document will contain a series of identical XML elements, but you don't know in advance how many there will be.
The XML data is always returned by the parser in text form. If the data is known to represent other data types such as numeric data, or date data, the XML-SAX handling procedure must use conversion functions such as %INT or %DATE to convert the data.
The XML-INTO operation will automatically convert the character data to the type of the field or subfield specified as the receiver.
'opt1=val1 opt2=val2'
- doc
- The "doc" option specifies whether the XML document that you provide to the operation is the name of an Integrated File System file containing the document, or the document itself. The default is "doc=string" indicating that you have provided an actual XML document. You use the option "doc=file" to indicate that you have provided the name of a file containing the actual XML document.
- ccsid
- The "ccsid" option specifies the CCSID in which the XML parser will return data. For the XML-SAX operation, you can specify any CCSID that the parser supports. For the XML-INTO operation, you can only control whether the parsing will be done in single-byte character or UCS-2. See the information in the ILE RPG Reference for more information on the "ccsid" option for each of these operation.