XML Input stage in DataStage

You can transform hierarchical XML data to flat relational tables by using the XML Input stage.

Use the XML Input stage to extract, validate, and transform XML data. You can extract data from a single column in a table or a whole document. XML Input supports a single input link and one or more output links.

Stage tab

Specify properties for the stage. For more information, see XML Input: Stage tab (DataStage).

Input tab

In the Input tab, specify the input column and the format of the XML document. An input column can contain an XML document, a URL, or a file path.

Output tab

In the Output tab, you can specify properties on the output links. You can specify one reject link to store rejection messages and rejected rows and select which output column to store them on.

You can also specify whether to inherit the Transformation properties from the stage, and use the Load box to specify XPath expressions. XPath expressions are used on output links to identify data in an XML document and transform it into columns and rows. See Transformation settings for more information. If you do not supply an XPath expression, the stage can use a passthrough mechanism to copy data without modification from an input link to an output link. This requires an exact match between the input and output column names, which are case-sensitive.

Select a repetition element by clicking Edit under Columns and selecting one of the columns as a key. The stage will generate an output row for each occurrence of the repetition element.

To transform the XML document to columns and rows, XML Input uses an XSLT stylesheet that it generates from the XPath expressions that are specified on the output link. If the XML document contains nodes whose relationships are not explicit, XML Input may not be able to automatically perform the transformation. Under Stylesheet, you can specify your own custom XSLT stylesheet. The output must conform to the following Document Type Definition (DTD).
<!ELEMENT table (row*)> 
<!ELEMENT row (column*)> 
<!ELEMENT column (#PCDATA | NULL)> 
<!ATTLIST column name CDATA #REQUIRED > 
<!ELEMENT NULL>