Tivoli Directory Integrator, Version 7.0

Simple XML Parser

The Simple XML Parser reads and writes XML documents; it deals with XML data which is not more than two levels deep. This Parser uses the Apache Xerces and Xalan libraries. The Parser gives access to XML document through a script object called xmldom. The xmldom object is an instance of the org.w3c.dom.Document interface. Refer to http://java.sun.com/xml/jaxp-1.0.1/docs/api/index.html for a complete description of this interface.

You can also use the XPathAPI (http://xml.apache.org/xalan-j/apidocs/index.html and access its Java Classes in your Scripts) to search and select nodes from the XML document. selectNodeList, a convenience method in the system object, can be used to select a subset from the XML document.

When the Connector is initialized, the Simple XML Parser tries to perform Document Type Definition (DTD) verification if a DTD tag is present.

Use the Connector's override functions to interpret or generate the XML document yourself. Create the necessary script in either the Override GetNext or GetNext Successful in your AssemblyLine's hook definitions. If you do not override, the Parser reads or writes a very simple XML document that mimics the entry object model. The default Parser only permits you to read or write XML files two levels deep. It will also read multi-valued attributes, although only one of the multi-value attributes will be shown when browsing the data in the Schema tab.

Note that certain methods, such as setAttribute are available in both the IBM® Tivoli® Directory Integrator entry and the objects returned by xmldom.createElement. These functions have the same name or signature. Do not confuse the xmldom objects with the IBM Tivoli Directory Integrator objects.

  1. This Parser was called "XML Parser" in pre-Tivoli Directory Integrator 7.0 releases. In Tivoli Directory Integrator 7.0 it is renamed to Simple XML Parser and a new XML Parser was added; see XML Parser. The new Parser has a lot of improvements and is now the main Tivoli Directory Integrator XML Parser.
  2. If you read large (more than 4MB) or write large (more than 14MB) XML files, your Java VM may run out of memory. Refer to "Increasing the memory available to the Virtual Machine" in IBM Tivoli Directory Integrator V7.0 Users Guide for a solution to this. Alternatively, use the XML Parser or the XML SAX Parser.
  3. The Parser silently ignores empty entries.
  4. When reading a CDATA attribute, no blank space is trimmed from the value. However, blank space is trimmed from attributes that are not CDATA.
  5. Certain characters, such as $, are illegal in XML tags. Avoid these characters in your attribute names when using the XML Parser because these characters might create illegal XML.
  6. When reading from an LDAP directory or an LDIF file, the distinguished name (DN) is typically returned in an attribute named $dn. If you map this attribute without changing the name into an XML file, it fails because $dn is not a legal tag in an XML document. If you do explicit mapping, you must change "$dn" to "dn" (or something without a special character) in your output Connector. If you do implicit mapping, for example, * or Automatically map all attributes checked in the AssemblyLine Settings (through the Config . . . tab of the AssemblyLine), you can configure the XML Parser to translate the distinguished name (for example, $dn) to a different name. For example, you can add something like this in the Before GetNext Hook:
    conn.setAttribute("dn", work.getAttribute("$dn")); 


The Parser has the following parameters:

Root Tag
The root tag (output).
Entry Tag
The entry tag for entries (output).
Value Tag
The value tag for entry attributes (output).
Character Encoding
Character Encoding to be used. See Character Encoding in the Simple XML Parser.
Omit XML Declaration
If checked, the XML declaration is omitted in the output stream.
Document Validation
If checked, this parser requests a DTD/Schema-validating parser.
Namespace Aware
If checked, this parser requests a namespace-aware parser.
Indent Output
If this field is checked, then the output is indented.
If this text is to be processed by a program (and not meant for human interpretation) you most likely will want to deselect this parameter. This way, no unnecessary spaces or newlines will be inserted in the output.
Detailed Log
If this parameter is checked, more detailed log messages are generated.

Character Encoding in the Simple XML Parser

The default and recommended Character Encoding to use when deploying the Simple XML Parser is UTF-8. This will preserve data integrity of your XML data in most cases. When you are forced to use a different encoding, the Parser will handle the various encodings in the following way:


Override Add hook:

var root = xmldom.getDocumentElement();
var entry = xmldom.createElement ("entry");
var names = work.getAttributeNames();

for ( i = 0; i < names.length; i++ ) {
	xmlNode = xmldom.createElement ("attribute");
	xmlNode.setAttribute ( "name", names[i] );
	xmlNode.appendChild ( xmldom.createTextNode ( work.getString( 
			names[i] ) ) );
	entry.appendChild ( xmlNode );
root.appendChild ( entry );

After Selection hook:

// Set up variables for "override getnext" hook

var root = xmldom.getDocumentElement();
var list = system.selectNodeList ( root, "//Entry" );
var counter = 0;

Override GetNext hook

// Note that the Iterator hooks are NOT called when we override the
		getnext function
// Initialization done in After Select Entries hook

var nxt = list.item ( counter );

if ( nxt != null ) {
   var ch = nxt.getFirstChild();
   while ( ch != null ) {
      var child = ch.getFirstChild();
      while (child != null ) {
        // Use the grandchild's value if it exist, to be able to 
				read multivalue attributes
	grandchild = child.getFirstChild();
	if (grandchild != null)
	  nodeValue = grandchild.getNodeValue();
     	else nodeValue = child.getNodeValue();
	// Ignore strings containing newlines, they are just fillers
     	if (nodeValue != null && nodeValue.indexOf('\n') 
				== -1) {
       	    work.addAttributeValue ( ch.getNodeName(), nodeValue );
	child = child.getNextSibling();
      ch = ch.getNextSibling();
   result.setStatus (1); // Not end of input yet
} else {
   result.setStatus (0); // Signal end of input

The previous example parses files containing items that look like the following entries:

    <title">Shoe salesman</title>

Suppose instead that the input looks like the following entries:

    <field name="firstName">John</field>
    <field name="lastName">Doe</field>
    <field name="title">Engineer</field>
    <field name="firstName">Al</field>
    <field name="lastName">Bundy</field>
    <field name="title">Shoe salesman</field>

Here the attribute names can be retrieved from attributes of the field node, and this code is used in the Override GetNext Hook:

var nxt = list.item ( counter );

if ( nxt != null ) {
 var ch = nxt.getFirstChild();
 while ( ch != null ) {
  if(String(ch.getNodeName()) == "field") {
   attrName = ch.getAttributes().item(0).getNodeValue();
   nodeValue = ch.getFirstChild().getNodeValue();
   work.addAttributeValue ( attrName, nodeValue );
  ch = ch.getNextSibling();

 result.setStatus (1); // Not end of input yet
} else {
 result.setStatus (0); // Signal end of input

This example package demonstrates how the base Simple XML Parser functionality can be extended to read XML more than two levels deep, by using the Override GetNext and Override Add hooks.

Additional Examples

Go to the root_directory/examples/simplexmlparser directory of your IBM Tivoli Directory Integrator.

See also

XML Parser,
XML SAX Parser,
XSL based XML Parser,
SOAP Parser,
DSML Parser.
[ Top of Page | Previous Page | Next Page | Contents | Terms of use | Feedback ]
(C) Copyright IBM Corporation, 2003, 2009. All Rights Reserved.
IBM Tivoli Directory Integrator 7.0