Note: This tip uses JAXP, but the sample application will also work with Xerces-Java 2, and the concepts are applicable for any XML parser environment.
This tip creates an application that traverses a simple XML document that contains information on which employees to contact in case of emergency:
Listing 1. The source document
<?xml version="1.0"?>
<personnel>
<employee empid="332" status="contact">
<deptid>24</deptid>
<shift>night</shift>
<name>Jenny Berman</name>
</employee>
<!-- Other employees listed here -->
</personnel> |
Ultimately, the application counts on the NodeFilter to eliminate employees with a status value of donotcontact.
The Document Object Model Level 2 Traversal Module defines objects that walk the tree of an XML document, displaying information about the current Node. The entire process of creating a TreeWalker is described in the tip Traversing an XML document with a TreeWalker, but for convenience, consider this application which displays the elements of the employee document:
Listing 2. Creating the TreeWalker
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import java.io.File;
import org.w3c.dom.Document;
import org.w3c.dom.DOMImplementation;
import org.w3c.dom.Node;
import org.w3c.dom.traversal.DocumentTraversal;
import org.w3c.dom.traversal.TreeWalker;
import org.w3c.dom.traversal.NodeIterator;
import org.w3c.dom.traversal.NodeFilter;
import org.w3c.dom.NamedNodeMap;
import org.w3c.dom.Element;
public class ShowDocument {
public static void main (String args[]) {
File docFile = new File("employees.xml");
Document doc = null;
try {
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
doc = db.parse(docFile);
} catch (Exception e) {
System.out.print("Problem parsing the file.");
}
DOMImplementation domimpl = doc.getImplementation();
if (domimpl.hasFeature("Traversal", "2.0")) {
Node root = doc.getDocumentElement();
int whattoshow = NodeFilter.SHOW_ALL;
NodeFilter nodefilter = null;
boolean expandreferences = false;
DocumentTraversal traversal = (DocumentTraversal)doc;
TreeWalker walker = traversal.createTreeWalker(root,
whattoshow,
nodefilter,
expandreferences);
Node thisNode = null;
thisNode = walker.nextNode();
while (thisNode != null) {
if (thisNode.getNodeType() == thisNode.ELEMENT_NODE) {
System.out.print(thisNode.getNodeName() + " ");
Element thisElement = (Element)thisNode;
NamedNodeMap attributes = thisElement.getAttributes();
System.out.print("(");
for (int i = 0; i < attributes.getLength(); i++) {
System.out.print(attributes.item(i).getNodeName() + "=\"" +
attributes.item(i).getNodeValue() + "\" ");
}
System.out.print(") : ");
} else if (thisNode.getNodeType() == thisNode.TEXT_NODE) {
System.out.print(thisNode.getNodeValue());
}
thisNode = walker.nextNode();
}
} else {
System.out.println("The Traversal module isn't supported.");
}
}
} |
When the TreeWalker traverses the Document tree, it displays Element names, attributes, and Text Nodes:
Listing 3. The application output -- all nodes
personnel () :
employee (empid="332" status="contact" ) :
deptid () : 24
shift () : night
name () : Jenny Berman
employee (empid="994" status="donotcontact" ) :
deptid () : 24
shift () : day
name () : Andrew Fule
employee (empid="948" status="contact" ) :
deptid () : 3
shift () : night
name () : Anna Bangle |
Notice that one of the parameters passed on the creation of the TreeWalker is a NodeFilter object that has been set to null. The result is that the TreeWalker sees all of the Nodes of the Document that satisfy the whattoshow value, NodeFilter.SHOW_ALL.
Creating a NodeFilter object gives you fine-grained control over the Nodes that are seen by the TreeWalker object. All that's required is a class that implements the NodeFilter interface, which consists of a single method, acceptNode(). When the TreeWalker encounters a Node, it passes it to the acceptNode() method to determine whether the Node is acceptable or not. Because this is a custom class, you can base that judgment on anything you can pack into an application. In this case, the judgment is based on the value of the status attribute:
Listing 4. Implementing the NodeFilter
import org.w3c.dom.traversal.NodeFilter;
import org.w3c.dom.Node;
import org.w3c.dom.Element;
public class EmployeeFilter implements NodeFilter {
public short acceptNode(Node thisNode) {
if (thisNode.getNodeType() == Node.ELEMENT_NODE) {
Element e = (Element)thisNode;
if (e.getAttribute("status").equals("donotcontact")) {
return NodeFilter.FILTER_SKIP;
}
}
return NodeFilter.FILTER_ACCEPT;
}
} |
Each Node is checked to see if it's an Element. If it is, the status attribute (if any) is checked. The filter skips all elements with a status attribute of donotcontact while accepting everything else.
All that's necessary now is to create the TreeWalker with the new NodeFilter object:
Listing 5. Setting the TreeWalker to see the NodeFilter
...
Node root = doc.getDocumentElement();
int whattoshow = NodeFilter.SHOW_ALL;
NodeFilter nodefilter = new EmployeeFilter();
boolean expandreferences = false;
DocumentTraversal traversal = (DocumentTraversal)doc;
TreeWalker walker = traversal.createTreeWalker(root,
whattoshow,
nodefilter,
expandreferences);
... |
Now when the TreeWalker traverses the Document, it checks each Node against the EmployeeFilter object, so it skips the Node that contains a status attribute of donotcontact:
Listing 6. The results
personnel () :
employee (empid="332" status="contact" ) :
deptid () : 24
shift () : night
name () : Jenny Berman
deptid () : 24
shift () : day
name () : Andrew Fule
employee (empid="948" status="contact" ) :
deptid () : 3
shift () : night
name () : Anna Bangle
|
Notice that the employee element is missing, but its children are not. In some cases, such as this application, this isn't what you really want. Instead of skipping a Node, you want to reject it altogether.
When a TreeWalker skips a Node, it moves on to the next Node encountered. In some cases, this is a child of the original. For this application, you're trying to eliminate employees who shouldn't be contacted, so rather than just skipping the employee element, you want to reject that element and all of its children. You can do this easily by changing the NodeFilter to use FILTER_REJECT instead of FILTER_SKIP:
Listing 7. Rejecting a Node
...
if (thisNode.getNodeType()==Node.ELEMENT_NODE) {
Element e = (Element)thisNode;
if (e.getAttribute("status").equals("donotcontact")) {
return NodeFilter.FILTER_REJECT;
}
}
return NodeFilter.FILTER_ACCEPT;
}
} |
Now when the application runs, the entire element (including its children) is missing:
Listing 8. Results of rejecting a Node
employee (empid="332" status="contact" ) :
deptid () : 24
shift () : night
name () : Jenny Berman
employee (empid="948" status="contact" ) :
deptid () : 3
shift () : night
name () : Anna Bangle
|
It's important to note that the TreeWalker is able to skip the entire Element because it understands the inherent parent-child relationships. A NodeIterator, on the other hand, sees the document in a flattened way, much like a SAX stream, and has no concept of parents or children. If you were to create a NodeIterator rather than a TreeWalker, FILTER_REJECT would act the same as FILTER_SKIP.
The Traversal module defines TreeWalkers and NodeIterators that look to an external NodeFilter object to determine which Nodes are visible. This enables you to create an application in which the available data can be controlled from outside the main application. A Node can be skipped, in which case the next Node is processed, or it can be rejected, in which case all of its children are also hidden from the main application.
- Check out the DOM Level 2 Traversal and Range Recommendation.
- Read about Traversing an XML document with a TreeWalker (developerWorks, October 2002).
- Download JAXP or Xerces-Java 2.
- Find more XML resources on the developerWorks XML zone. For a complete list of XML tips to date, check out the tips summary page.
-
IBM trial software: Build your next development project with trial software available for download directly from developerWorks.
- Find out how you can become an IBM Certified Developer in XML and related technologies.
- Want us to send you useful XML tips like this every week? Sign up for the developerWorks XML Tips newsletter.
Nicholas Chase has been involved in Web site development for companies such as Lucent Technologies, Sun Microsystems, Oracle, and the Tampa Bay Buccaneers. Nick has been a high school physics teacher, a low-level radioactive waste facility manager, an online science fiction magazine editor, a multimedia engineer, and an Oracle instructor. More recently, he was the Chief Technology Officer of Site Dynamics Interactive Communications in Clearwater, Florida, USA, and is the author of three books on Web development, including Java and XML From Scratch (Que) and the upcoming Primer Plus XML Programming (Sams). He loves to hear from readers and can be reached at nicholas@nicholaschase.com.