Skip to main content

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

All information submitted is secure.

  • Close [x]

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

  • Close [x]

Walking the Web with DOM

How to use the DOM HTML module with servlets

Brett McLaughlin (brett@newInstance.com), Enhydra strategist, Lutris Technologies
Brett McLaughlin (brett@newInstance.com) works as Enhydra strategist at Lutris Technologies and specializes in distributed systems architecture. He is author of Java and XML (O'Reilly). He is involved in technologies such as Java servlets, Enterprise JavaBeans technology, XML, and business-to-business applications. Along with Jason Hunter, he founded the JDOM project, which provides a simple API for manipulating XML from Java applications. He is also an active developer on the Apache Cocoon project and the EJBoss EJB server as well as a co-founder of the Apache Turbine project.

Summary:  Using the DOM HTML module, developers can work with their XML documents and output them as HTML, using the same familiar API for both tasks. This article explains the DOM HTML module, shows examples of its use, and explains how to output HTML using DOM in a servlet environment. Code samples demonstrate the techniques.

Date:  01 Oct 2001
Level:  Introductory

Activity:  4644 views
Comments:  

Like the DOM Traversal module, which I introduced recently (see Resources), the DOM HTML module adds functionality to the basic DOM. The name of the module is fairly self explanatory, so I'm sure you're not scratching your head over what this one does, but I'll start with an overview anyway.

The DOM HTML module is geared exclusively toward working with HTML. What's cool and noteworthy is that the module allows you to work with HTML through the DOM interfaces and methods already familiar to you. While initially this might not seem like a big deal, it turns out to be pretty slick. If you're like me, you probably spend too much time working through out.println statements in your servlet code, trying to find a misplaced form tag, or a missing closing table row tag (tr). This gets pretty annoying, pretty quickly. Using DOM HTML, though, that becomes a thing of the past. The DOM HTML module provides an object model, built on the core DOM module, that adds support for all of the various HTML tags like table, p, and body. And, it supports the HTML tags in a way that makes it impossible to leave a tag out of place or forget to close something out. I'll show you how all of this works in this article, and you'll be coding in the HTML DOM module in no time.

Getting ready

I recommend downloading the Docuverse DOM SDK for working with the DOM HTML module. It makes things simple, and it has somewhat more full-featured support of DOM HTML than Apache Xerces currently has, which makes it an ideal environment for this article's examples. Once you have that toolkit, you can parse HTML and work with it using the DOM HTML module. (Today's version of Docuverse doesn't include DOM level 2 support, but it's planned for a future version.)

You also need a working Java servlet engine. For this article I use the Apache project's Jakarta Tomcat (see Resources), which is both open source and the Java reference implementation for servlets and JSP. For the examples in this article, I've used Tomcat version 3.2.3.

Note: For those of you following along with Tomcat, I dropped the Docuverse domsdk.jar and w3cdom1.jar jars into the $TOMCAT_HOME/lib/ directory. I've dropped my servlet classes into the ROOT context, at $TOMCAT_HOME/webapps/ROOT/WEB-INF/classes. If you take those steps, you'll have a setup much like mine.

This article covers using DOM HTML from a servlet, so having a servlet engine installed and working is a must if you plan to follow along with the examples. Once you have you tools in place, you're ready to get on to the code.


Building an HTML template

This article primarily addresses those of you who are writing Web-based applications; you are the folks who spend your time creating, printing, and parsing HTML. If you're like me, you've spent hundreds, if not thousands, of hours wading through servlet code that does nothing more than print out an HTML document. Pretty tedious, isn't it? I expect that the DOM HTML module can help you avoid some of that tedium, particularly the annoying little mistakes that we all make when we use that conventional approach (like closing out a table cell in a nested table ... ever had that happen? Uggh!).

Now, to all of you JSP adherents out there, I realize that you may be losing interest in the rest of the article. However, I urge you to stay tuned: I'd bet that any developers who tell me that they won't ever need to write another servlet will have to eat their words. In other words, even if JSP (or XMLC, or Cocoon, or XSLT) is your weapon of choice, you'll want to be able to wield another couple of weapons as well: DOM HTML could be one of them.

To get started, consider the common use-case which makes DOM HTML a magic bullet. Take a typical dynamic web page: you get some information from the user, you write out a fancy, formatted HTML page, and in just two or three places, you insert the user-supplied data. The end result is hundreds of lines of servlet code to make just a few simple changes to an otherwise static HTML document. With the DOM HTML module, though, this becomes a trivial task. To show how it works, walk through an example of this scenario.

First, you need to define an HTML template. This is the basic formatted page that all the dynamic content fits within. Listing 1 shows this template, which has several placeholders for data: the HTML page title, the heading of the page, a subtitle, and the actual page content. At runtime, you want the servlet to read in this template, replace the placeholders with actual live data, and then output the HTML. However, you don't want to be forced to use SAX or DOM when the page is obviously HTML; the DOM HTML module can help. So take a look at Listing 1, which is the basis for the dynamic content.

Be sure to note the various places where live data will fit into the template.


Reading in the template

Now, you need to create a servlet that will read in the template from Listing 1 and populate it with dynamic data.

Note: For the sake of brevity, I'm putting all the example servlet code in this article into a single servlet. In a production application, you might want to put common parsing code into a utility class. You could then have a servlet for each dynamic page in your application that uses this common parsing code and then inserts its own dynamic data.

The first task for this servlet is to read in the HTML template so that it's available for insertion of dynamic data. Listing 2 shows the initial version of my servlet, complete with DOM HTML parsing code. Once you take a look at the listing, I'll explain how the magic happens.

Note: For those of you who don't have an existing application to fit this into, I've also included a doGet() method which lets you enter in some "false" data to test things out. That method simply submits the data to the doPost() method, which I'll code up in the next section to insert data in the template. By the way it also shows what a pain outputting HTML without the DOM HTML module can be!

Listing 2. Parsing an HTML document

import java.io.File;
import java.io.FileInputStream;
import java.io.InputStream;
import java.io.IOException;
import java.io.PrintWriter;
import javax.servlet.ServletConfig;
import javax.servlet.ServletException;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;

// Docuverse imports
import com.docuverse.dom.DOM;
import com.docuverse.dom.html.HTMLFactory;

// DOM imports
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.w3c.dom.html.HTMLDocument;

public class ResponseServlet extends HttpServlet {

    // Template file location
    private static final String TEMPLATE_FILE =
        "template.html";

    // DOM to use for reading and writing
    private DOM dom;

    // HTML Document representing the template
    private HTMLDocument doc;

    public void init(ServletConfig config) 
        throws ServletException {

        super.init(config);

        try {
            // Get access to the template
            File file = new File(TEMPLATE_FILE);
            InputStream input = 
                new FileInputStream(file);
    
            // Parse the HTML template
            dom = new DOM();
            dom.setProperty("sax.driver", 
                "com.docuverse.html.swing.SAXDriver");
            dom.setFactory(new HTMLFactory());
            doc = (HTMLDocument)dom.readDocument(input);
        } catch (IOException e) {
            throw new ServletException(e.getMessage());
        }
    }

    public void doPost(HttpServletRequest req, 
                       HttpServletResponse res)
        throws IOException, ServletException {

        // Get dynamic data and do something with it
    }

    public void doGet(HttpServletRequest req, 
                      HttpServletResponse res)
        throws IOException, ServletException {

        PrintWriter out = res.getWriter();
        out.println("<html>");
        out.println("<body>");
        out.println("<form method='POST' " +
            "action='/servlet/ResponseServlet'>");
        out.println("HTML Title: <input type='text' " +
            "name='htmlTitle' /><br>");
        out.println("Page Title: <input type='text' " +
            "name='pageTitle' /><br>");
        out.println("Page Subtitle: <input type='text' " +
            "name='pageSubtitle' /><br>");
        out.println("Page Content: <textarea name='pageContent'>" +
            "</textarea><br><br>");
        out.println("<input type='submit'/>");
        out.println("</form></body>");
        out.println("</html>");

        out.close();
    }
}

The important part of Listing 2 occurs in the init() method, which sets up the template for use in the other servlet methods. First, it creates a new import com.docuverse.dom.DOM object, which is the basis for DOM operations with the Docuverse toolkit. The next thing to do is to setup a SAX driver for use in parsing. This will assist in taking a standard HTML document (which may not be well formed), and convert it to a well-formed XML-compliant document, which just happens to be in an HTML format. Once that's in place, you'll need to let the DOM toolkit know to use HTML classes for building the DOM tree rather than the standard core DOM classes. This is done through supplying an instance of the com.docuverse.dom.html.HTMLFactory class to the DOM.setFactory() method. Once all this setup is taken care of, you're ready to parse the document, which is done with the DOM.readDocument(). The result is what we're trying to get at, an instance of the org.w3c.dom.html.HTMLDocument interface (shew ... that's a mouthful).

Actually, all of the DOM HTML classes are in the org.w3c.dom.html package, and they all extend the core DOM interfaces like org.w3c.dom.Node, org.w3c.dom.Document, and org.w3c.dom.Element. I'm not going to explain the individual classes because they are all very simple to use. Check the Javadocs for these classes out, and you'll quickly get a grasp of what's going on. In any case, once you get your HTMLDocument, you're ready to modify some content.


Manipulating HTML

Once you've gotten your HTML DOM structure loaded into memory, actually working with it becomes a piece of cake. This isn't because the methods are simply named or because the documentation is wonderful, but because there isn't anything new to learn. Let me say that again: there isn't anything new to learn! Because the HTML DOM interfaces extend the core DOM interfaces you already know and love (org.w3c.dom.Node, org.w3c.dom.Element, and so on), you can use the methods you've grown accustomed to over the years to work with your HTML structure. For example, you could create a new DIV element, set the alignment for it, and then add it to the BODY element of your document. While the element creation is specific to the DOM HTML module (much like creating a new HTML document, shown in the last section), adding the element into the tree is done through run-of-the-mill core DOM methods. Listing 3 shows this -- within the doPost() method -- in action.


Listing 3. Filling in the data


public void doPost(HttpServletRequest req, HttpServletResponse res)
    throws IOException, ServletException {

    // Get dynamic data and do something with it
    String htmlTitle = req.getParameterValues("htmlTitle")[0];
    String pageTitle = req.getParameterValues("pageTitle")[0];
    String pageSubtitle = req.getParameterValues("pageSubtitle")[0];
    String pageContent = req.getParameterValues("pageContent")[0];

    // Navigate and set data
    doc.setTitle(htmlTitle);
    NodeList tableCells = doc.getElementsByTagName("td");

    // 2nd TD is the title
    Node td = tableCells.item(1);
    Node fontFace = td.getFirstChild();
    fontFace.getFirstChild().setNodeValue(pageTitle);

    // 5th TD is subtitle
    td = tableCells.item(4);
    Node center = td.getFirstChild();
    Node b = center.getFirstChild();
    b.getFirstChild().setNodeValue(pageSubtitle);
        
    // Text is the 6th TD
    td = tableCells.item(5);
    Node p = td.getFirstChild();
    p.getFirstChild().setNodeValue(pageContent);
    
    // Write out HTML
    dom.writeDocument(doc, res.getWriter()); 
}

For the sake of the example, I made pretty short work of getting the data from the user (which would probably be more specific to your business needs in a real application). Then I navigated through the HTML document using standard DOM methods, with a few DOM HTML specifics thrown in (like the setTitle() method, for example). Take careful note that the code that I wrote is very dependent upon the specifics of my template file. This is OK, because I control the template file. In other words, it's a great solution for a specific template, and it turns out to be a slick way of handling dynamic data. It's not as generic as using the core DOM, but it also solves your problem a lot better than the core DOM would.

You should also keep in mind that this solution works no matter how complex your HTML template is. You'll just need to spend a little time learning the HTML structure and writing DOM code to traverse it. Of course, there are some even better ways to handle this problem, and I'll mention them briefly in the article's conclusion. However, I wanted to make one final point before letting you go off and conquer the world.

Currently, as I mentioned above, the Docuverse DOM SDK only supports DOM Level 1. However, expect it to move to DOM Level 2 before much longer; this opens up some more possibilities. Specifically, you'll start to see methods that allow you to look for elements based on their ID (you can also expect XPointer and XLink processors to be useful here). At that point, using an ID tag on dynamic data can make finding that element a piece of cake. And that, of course, means simpler code, easier maintenance, and faster revisions ... all very good things. So keep an eye out as DOM and the Docuverse SDK mature -- and as other processors add support for DOM HTML.


Conclusion

Well, there you have it. If you've been working through these DOM module articles one by one, you are starting to see your toolkit of XML tricks begin to grow. With the DOM Traversal module, you can easily move throughout a DOM tree. Now you ought to be able to use DOM HTML in your servlets and other Web applications to output HTML easily, without all of those messy out.println() statements in your code.

In addition, you can begin to think about ways of using these modules with your other tools, creating even more interesting possibilities. For example, with DOM HTML you could create an HTML document and let the user interact with it. You can even take existing HTML forms and pages and parse through them, using DOM Traversal to isolate parts of the pages to work with (much easier than staring at a text editor, right?). Once you've got that down, look at using XML transformations, TrAX (the Transformations for XML API (see Resources), along with DOM HTML and Traversal. In the next article on the DOM modules subject, I'll add the DOM Range module. If you want to know how that helps out, tune in next time. Until then, enjoy, and I'll see you online.


Resources

  • Get grounded in DOM basics by taking the free developerWorks Understanding DOM tutorial.

  • If you need more background on developing XML, try the developerWorks XML programming in Java tutorial.

  • You can begin deep study at the W3C's DOM Activity Page.

  • Go directly to the current DOM specification, DOM Level 2.

  • Read up on the DOM HTML module specification.

  • Learn more about XML transformations and TrAX in the author's article on JAXP version 1.1.

  • Download the Jakarta Tomcat servlet engine for running the example code if you don't already have a servlet engine installed.

  • Get more on Java and XML than you can shake a stick at in Brett's new second edition of Java and XML .

  • IBM's WebSphere Application Server supports DOM processing through the built-in XML4J parser, which is based upon the Apache Xerces parser. Explore the nature of DOM processing support in WAS 3.0 Advanced Edition's detailed online documentation. Look for the DOM section of the What Is tab, under XML and Developing Applications.

About the author

Brett McLaughlin (brett@newInstance.com) works as Enhydra strategist at Lutris Technologies and specializes in distributed systems architecture. He is author of Java and XML (O'Reilly). He is involved in technologies such as Java servlets, Enterprise JavaBeans technology, XML, and business-to-business applications. Along with Jason Hunter, he founded the JDOM project, which provides a simple API for manipulating XML from Java applications. He is also an active developer on the Apache Cocoon project and the EJBoss EJB server as well as a co-founder of the Apache Turbine project.

Report abuse help

Report abuse

Thank you. This entry has been flagged for moderator attention.


Report abuse help

Report abuse

Report abuse submission failed. Please try again later.


developerWorks: Sign in


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

Choose your display name

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


Rate this article

Comments

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=XML
ArticleID=12038
ArticleTitle=Walking the Web with DOM
publish-date=10012001
author1-email=brett@newInstance.com
author1-email-cc=

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

For articles in technology zones (such as Java technology, Linux, Open source, XML), Popular tags shows the top tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), Popular tags shows the top tags for just that product zone.

For articles in technology zones (such as Java technology, Linux, Open source, XML), My tags shows your tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), My tags shows your tags for just that product zone.

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Try IBM PureSystems. No charge.

Special offers