About transforming tabular data (DataStage)

XML Output requires XPath expressions to transform tabular data to XML. A table definition stores the XPath expressions. Using the Description property on the Columns pages within the stage, you record or maintain the XPath expressions.

About National Language Support (NLS) in DataStage

XML Output supports different character encodings for output documents, depending on the NLS mode.

DataStage in NLS mode

When DataStage® runs in NLS mode, all Internet Assigned Numbers Authority (IANA) character sets are supported. For a complete list of character sets, visit the following IANA web page:


http://www.iana.org/assignments/character-sets

For information about selecting the output encoding, see Options page.

Preserving the encoding of output documents

An integral part of any XML document is its encoding. If you apply an DataStage map to the document, the document may become corrupted.

To prevent corrupting an XML document, perform one of the following steps:

  • Set the stage map to NONE in each downstream stage.
  • Set the map for the column that contains the XML input to NONE in each downstream stage.
  • Set the SQL type for the column that contains the XML input to VarBinary in each downstream stage and on the output link of the XML Input stage.

DataStage in non-NLS mode

When DataStage runs in non-NLS mode, note the following information:

  • The document is written in UTF-8.
  • Input columns are encoded using the local codepage of the machine hosting the engine tier. Therefore, assume that input data to the XML Output stage has been encoded with this codepage.

Supported XPath expressions

The following Backus Naur Form (BNF) diagram describes the subset of XPath expressions that you can use in XML Output.


path                ::= ['/'] (element_spec '/')* end_segment
end_segment         ::= element_spec['/text()'] | '@' attribute
element_spec        ::= element '[ 'attr_value ( 'and' attr_value )*' ]'
attr_value          ::= '@' attribute '=' '"'value'"'

Equivalent XPath expressions

For an XML Output operation, two types of XPath expressions are equivalent. Both expressions result in the text node being included:

  • An expression that ends with an element name: /a/b
  • An expression that ends with a text node: /a/b/text()

Using XPath expressions

If a stage has both an input and an output link, XPath expressions are required on both links.

XPaths on input link

On the input link, the XPath expressions drive the generation of XML. Each XPath expression maps the values of an input column to a node in an XML hierarchy.

XPaths on output link

Each output column that has an XPath expression is a candidate for receiving XML. The source of the XML for an output column are those input columns whose XPath expressions start with and contain the same nodes.

To make the entire XML available to an output column, use one forward slash as the XPath expression. The forward slash identifies the root node.

The following table demonstrates the relationship between XPath expressions on the input and output links. Two output columns use XPath expressions that form the first part of one of more XPath expressions used by input columns. For example, the output column that uses the XPath expression /orders receives XML generated using the XPath expressions /orders/cust and /orders/items. The column that uses the forward slash receives all the XML.

Table 1. Relationship between XPath expressions on the input and output links.
Input column XPaths Output column XPaths
 

/orders

/orders/items

/

/orders/cust
Yes No Yes

/addresses
No No Yes

/addresses/orders
No No Yes

/orders/items
Yes Yes Yes

Mapping related data to different root elements

You can easily segregate related data in the XML by varying the root element. This feature is available when your XML Output stage has both input and output links. In a stage with only an input link, all XPath expressions must specify the same root element.

Example of XPath expressions

The input consists of addresses and orders for customers. The address data is grouped using the root element /addresses. The order data is grouped using the root element /orders.

Using the root element to group related data, such as address data

The ADDRESSES column receives the following XML structures:

<addresses>
   <address street=" " city=" ">
   ...
</addresses>

The ORDERS column receives the following XML structures:

<orders>
   <order id=" ">
      <order item=" ">
      ...
</orders>

Parsing XML reserved and special characters

You can avoid parsing reserved and special XML characters that are already represented by entity references (&entity;) by setting the Data element property on the input link to XML.

If you use a different data element value or omit it, XML Output parses the input to make it XML-safe.

For example, the value &lt; replaces the less-than symbol (<).