Tip: Using a DOM NodeFilter

Control which nodes are visible to a TreeWalker or NodeIterator

XML's DOM Level 2 Traversal module provides two new objects, the TreeWalker and the NodeIterator, which simplify the process of navigating a Document. More than that, the module defines a NodeFilter, which can be used to programmatically control what nodes are visible to the TreeWalker or NodeFilter. This tip shows you how to create a NodeFilter as well as a Traversal object that uses it.

Share:

Nicholas Chase (nicholas@nicholaschase.com), President, Chase and Chase Inc.

Nicholas Chase has been involved in Web site development for companies such as Lucent Technologies, Sun Microsystems, Oracle, and the Tampa Bay Buccaneers. Nick has been a high school physics teacher, a low-level radioactive waste facility manager, an online science fiction magazine editor, a multimedia engineer, and an Oracle instructor. More recently, he was the Chief Technology Officer of Site Dynamics Interactive Communications in Clearwater, Florida, USA, and is the author of three books on Web development, including Java and XML From Scratch (Que) and the upcoming Primer Plus XML Programming (Sams). He loves to hear from readers and can be reached at nicholas@nicholaschase.com.



01 November 2002

Note: This tip uses JAXP, but the sample application will also work with Xerces-Java 2, and the concepts are applicable for any XML parser environment.

The source code

This tip creates an application that traverses a simple XML document that contains information on which employees to contact in case of emergency:

Listing 1. The source document
<?xml version="1.0"?>
<personnel>
   <employee empid="332" status="contact">
        <deptid>24</deptid>
        <shift>night</shift>
        <name>Jenny Berman</name>
   </employee>
   <!-- Other employees listed here -->
</personnel>

Ultimately, the application counts on the NodeFilter to eliminate employees with a status value of donotcontact.


Traversing the tree

The Document Object Model Level 2 Traversal Module defines objects that walk the tree of an XML document, displaying information about the current Node. The entire process of creating a TreeWalker is described in the tip Traversing an XML document with a TreeWalker, but for convenience, consider this application which displays the elements of the employee document:

Listing 2. Creating the TreeWalker
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import java.io.File;
import org.w3c.dom.Document;
import org.w3c.dom.DOMImplementation;
import org.w3c.dom.Node;
import org.w3c.dom.traversal.DocumentTraversal;
import org.w3c.dom.traversal.TreeWalker;
import org.w3c.dom.traversal.NodeIterator;
import org.w3c.dom.traversal.NodeFilter;
import org.w3c.dom.NamedNodeMap;
import org.w3c.dom.Element;

public class ShowDocument {

    public static void main (String args[]) {
       File docFile = new File("employees.xml");
                
       Document doc = null;
       try {
          DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
          DocumentBuilder db = dbf.newDocumentBuilder();
        
          doc = db.parse(docFile);
       } catch (Exception e) {
           System.out.print("Problem parsing the file.");
       }

       DOMImplementation domimpl = doc.getImplementation();
       if (domimpl.hasFeature("Traversal", "2.0")) {

           Node root = doc.getDocumentElement();
           int whattoshow = NodeFilter.SHOW_ALL;
          NodeFilter nodefilter = null; 
           boolean expandreferences = false;

           DocumentTraversal traversal = (DocumentTraversal)doc;
  
           TreeWalker walker = traversal.createTreeWalker(root, 
                                                          whattoshow, 
                                                          nodefilter, 
                                                          expandreferences);
           Node thisNode = null;
           thisNode = walker.nextNode();
           while (thisNode != null) {
              if (thisNode.getNodeType() == thisNode.ELEMENT_NODE) {
                 System.out.print(thisNode.getNodeName() + " ");
                 Element thisElement = (Element)thisNode;
                 NamedNodeMap attributes = thisElement.getAttributes();
                 System.out.print("(");
                 for (int i = 0; i < attributes.getLength(); i++) {
                    System.out.print(attributes.item(i).getNodeName() + "=\"" +
                                     attributes.item(i).getNodeValue() + "\" ");
                 }
                 System.out.print(") : ");
              } else if (thisNode.getNodeType() == thisNode.TEXT_NODE) {
                 System.out.print(thisNode.getNodeValue());
              }
              thisNode = walker.nextNode();
          }

        } else {
           System.out.println("The Traversal module isn't supported.");
        }
   }
}

When the TreeWalker traverses the Document tree, it displays Element names, attributes, and Text Nodes:

Listing 3. The application output -- all nodes
personnel () :
   employee (empid="332" status="contact" ) :
        deptid () : 24
        shift () : night
        name () : Jenny Berman

   employee (empid="994" status="donotcontact" ) :
        deptid () : 24
        shift () : day
        name () : Andrew Fule

   employee (empid="948" status="contact" ) :
        deptid () : 3
        shift () : night
        name () : Anna Bangle

Notice that one of the parameters passed on the creation of the TreeWalker is a NodeFilter object that has been set to null. The result is that the TreeWalker sees all of the Nodes of the Document that satisfy the whattoshow value, NodeFilter.SHOW_ALL.


Creating a NodeFilter

Creating a NodeFilter object gives you fine-grained control over the Nodes that are seen by the TreeWalker object. All that's required is a class that implements the NodeFilter interface, which consists of a single method, acceptNode(). When the TreeWalker encounters a Node, it passes it to the acceptNode() method to determine whether the Node is acceptable or not. Because this is a custom class, you can base that judgment on anything you can pack into an application. In this case, the judgment is based on the value of the status attribute:

Listing 4. Implementing the NodeFilter
import org.w3c.dom.traversal.NodeFilter;
import org.w3c.dom.Node;
import org.w3c.dom.Element;

public class EmployeeFilter implements NodeFilter { 

    public short acceptNode(Node thisNode) { 
         if (thisNode.getNodeType() == Node.ELEMENT_NODE) { 
              Element e = (Element)thisNode; 
              if (e.getAttribute("status").equals("donotcontact")) {
                   return NodeFilter.FILTER_SKIP; 
              }  
         } 
         return NodeFilter.FILTER_ACCEPT; 
    } 
}

Each Node is checked to see if it's an Element. If it is, the status attribute (if any) is checked. The filter skips all elements with a status attribute of donotcontact while accepting everything else.

All that's necessary now is to create the TreeWalker with the new NodeFilter object:

Listing 5. Setting the TreeWalker to see the NodeFilter
...
           Node root = doc.getDocumentElement();
           int whattoshow = NodeFilter.SHOW_ALL;
           NodeFilter nodefilter = new EmployeeFilter(); 
           boolean expandreferences = false;

           DocumentTraversal traversal = (DocumentTraversal)doc;
  
           TreeWalker walker = traversal.createTreeWalker(root, 
                                                          whattoshow, 
                                                          nodefilter, 
                                                          expandreferences);
...

Now when the TreeWalker traverses the Document, it checks each Node against the EmployeeFilter object, so it skips the Node that contains a status attribute of donotcontact:

Listing 6. The results
personnel () :
   employee (empid="332" status="contact" ) :
        deptid () : 24
        shift () : night
        name () : Jenny Berman


        deptid () : 24
        shift () : day
        name () : Andrew Fule

   employee (empid="948" status="contact" ) :
        deptid () : 3
        shift () : night
        name () : Anna Bangle

Notice that the employee element is missing, but its children are not. In some cases, such as this application, this isn't what you really want. Instead of skipping a Node, you want to reject it altogether.


FILTER_SKIP vs. FILTER_REJECT

When a TreeWalker skips a Node, it moves on to the next Node encountered. In some cases, this is a child of the original. For this application, you're trying to eliminate employees who shouldn't be contacted, so rather than just skipping the employee element, you want to reject that element and all of its children. You can do this easily by changing the NodeFilter to use FILTER_REJECT instead of FILTER_SKIP:

Listing 7. Rejecting a Node
...
         if (thisNode.getNodeType()==Node.ELEMENT_NODE) { 
              Element e = (Element)thisNode; 
              if (e.getAttribute("status").equals("donotcontact")) {
                   return NodeFilter.FILTER_REJECT; 
              }  
         } 
         return NodeFilter.FILTER_ACCEPT; 
    } 
}

Now when the application runs, the entire element (including its children) is missing:

Listing 8. Results of rejecting a Node
   employee (empid="332" status="contact" ) :
        deptid () : 24
        shift () : night
        name () : Jenny Berman


   employee (empid="948" status="contact" ) :
        deptid () : 3
        shift () : night
        name () : Anna Bangle

It's important to note that the TreeWalker is able to skip the entire Element because it understands the inherent parent-child relationships. A NodeIterator, on the other hand, sees the document in a flattened way, much like a SAX stream, and has no concept of parents or children. If you were to create a NodeIterator rather than a TreeWalker, FILTER_REJECT would act the same as FILTER_SKIP.


Summary

The Traversal module defines TreeWalkers and NodeIterators that look to an external NodeFilter object to determine which Nodes are visible. This enables you to create an application in which the available data can be controlled from outside the main application. A Node can be skipped, in which case the next Node is processed, or it can be rejected, in which case all of its children are also hidden from the main application.

Resources

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into XML on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=XML
ArticleID=12181
ArticleTitle=Tip: Using a DOM NodeFilter
publish-date=11012002