Like the DOM Traversal module, which I introduced recently (see Resources), the DOM HTML module adds functionality to the basic DOM. The name of the module is fairly self explanatory, so I'm sure you're not scratching your head over what this one does, but I'll start with an overview anyway.
The DOM HTML module is geared exclusively toward working with HTML. What's
cool and noteworthy is that the module allows you to work with HTML
through the DOM interfaces and methods already familiar to you. While initially this might not seem like a big deal, it turns out to be
pretty slick. If you're like me, you probably spend too much time working
through out.println statements in your servlet
code, trying to find a misplaced form tag, or a
missing closing table row tag (tr). This gets
pretty annoying, pretty quickly. Using DOM HTML, though, that becomes a
thing of the past. The DOM HTML module provides an object model, built on the
core DOM module, that adds support for all of the various HTML tags like
table, p, and
body. And, it supports the HTML tags in a way that makes it
impossible to leave a tag out of place or forget to close something
out. I'll show you how all of this works in this article, and you'll be coding in the HTML DOM module in no time.
I recommend downloading the Docuverse DOM SDK for working with the DOM HTML module. It makes things simple, and it has somewhat more full-featured support of DOM HTML than Apache Xerces currently has, which makes it an ideal environment for this article's examples. Once you have that toolkit, you can parse HTML and work with it using the DOM HTML module. (Today's version of Docuverse doesn't include DOM level 2 support, but it's planned for a future version.)
You also need a working Java servlet engine. For this article I use the Apache project's Jakarta Tomcat (see Resources), which is both open source and the Java reference implementation for servlets and JSP. For the examples in this article, I've used Tomcat version 3.2.3.
Note: For those of you following along with Tomcat, I dropped the Docuversedomsdk.jarandw3cdom1.jarjars into the $TOMCAT_HOME/lib/ directory. I've dropped my servlet classes into the ROOT context, at $TOMCAT_HOME/webapps/ROOT/WEB-INF/classes. If you take those steps, you'll have a setup much like mine.
This article covers using DOM HTML from a servlet, so having a servlet engine installed and working is a must if you plan to follow along with the examples. Once you have you tools in place, you're ready to get on to the code.
This article primarily addresses those of you who are writing Web-based applications; you are the folks who spend your time creating, printing, and parsing HTML. If you're like me, you've spent hundreds, if not thousands, of hours wading through servlet code that does nothing more than print out an HTML document. Pretty tedious, isn't it? I expect that the DOM HTML module can help you avoid some of that tedium, particularly the annoying little mistakes that we all make when we use that conventional approach (like closing out a table cell in a nested table ... ever had that happen? Uggh!).
Now, to all of you JSP adherents out there, I realize that you may be losing interest in the rest of the article. However, I urge you to stay tuned: I'd bet that any developers who tell me that they won't ever need to write another servlet will have to eat their words. In other words, even if JSP (or XMLC, or Cocoon, or XSLT) is your weapon of choice, you'll want to be able to wield another couple of weapons as well: DOM HTML could be one of them.
To get started, consider the common use-case which makes DOM HTML a magic bullet. Take a typical dynamic web page: you get some information from the user, you write out a fancy, formatted HTML page, and in just two or three places, you insert the user-supplied data. The end result is hundreds of lines of servlet code to make just a few simple changes to an otherwise static HTML document. With the DOM HTML module, though, this becomes a trivial task. To show how it works, walk through an example of this scenario.
First, you need to define an HTML template. This is the basic formatted page that all the dynamic content fits within. Listing 1 shows this template, which has several placeholders for data: the HTML page title, the heading of the page, a subtitle, and the actual page content. At runtime, you want the servlet to read in this template, replace the placeholders with actual live data, and then output the HTML. However, you don't want to be forced to use SAX or DOM when the page is obviously HTML; the DOM HTML module can help. So take a look at Listing 1, which is the basis for the dynamic content.
Be sure to note the various places where live data will fit into the template.
Now, you need to create a servlet that will read in the template from Listing 1 and populate it with dynamic data.
Note: For the sake of brevity, I'm putting all the example servlet code in this article into a single servlet. In a production application, you might want to put common parsing code into a utility class. You could then have a servlet for each dynamic page in your application that uses this common parsing code and then inserts its own dynamic data.
The first task for this servlet is to read in the HTML template so that it's available for insertion of dynamic data. Listing 2 shows the initial version of my servlet, complete with DOM HTML parsing code. Once you take a look at the listing, I'll explain how the magic happens.
Note: For those of you who don't have an existing application to fit this into, I've also included a doGet() method which lets you enter in some "false" data to test things out. That method simply submits the data to the doPost() method, which I'll code up in the next section to insert data in the template. By the way it also shows what a pain outputting HTML without the DOM HTML module can be!
Listing 2. Parsing an HTML document
import java.io.File;
import java.io.FileInputStream;
import java.io.InputStream;
import java.io.IOException;
import java.io.PrintWriter;
import javax.servlet.ServletConfig;
import javax.servlet.ServletException;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
// Docuverse imports
import com.docuverse.dom.DOM;
import com.docuverse.dom.html.HTMLFactory;
// DOM imports
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.w3c.dom.html.HTMLDocument;
public class ResponseServlet extends HttpServlet {
// Template file location
private static final String TEMPLATE_FILE =
"template.html";
// DOM to use for reading and writing
private DOM dom;
// HTML Document representing the template
private HTMLDocument doc;
public void init(ServletConfig config)
throws ServletException {
super.init(config);
try {
// Get access to the template
File file = new File(TEMPLATE_FILE);
InputStream input =
new FileInputStream(file);
// Parse the HTML template
dom = new DOM();
dom.setProperty("sax.driver",
"com.docuverse.html.swing.SAXDriver");
dom.setFactory(new HTMLFactory());
doc = (HTMLDocument)dom.readDocument(input);
} catch (IOException e) {
throw new ServletException(e.getMessage());
}
}
public void doPost(HttpServletRequest req,
HttpServletResponse res)
throws IOException, ServletException {
// Get dynamic data and do something with it
}
public void doGet(HttpServletRequest req,
HttpServletResponse res)
throws IOException, ServletException {
PrintWriter out = res.getWriter();
out.println("<html>");
out.println("<body>");
out.println("<form method='POST' " +
"action='/servlet/ResponseServlet'>");
out.println("HTML Title: <input type='text' " +
"name='htmlTitle' /><br>");
out.println("Page Title: <input type='text' " +
"name='pageTitle' /><br>");
out.println("Page Subtitle: <input type='text' " +
"name='pageSubtitle' /><br>");
out.println("Page Content: <textarea name='pageContent'>" +
"</textarea><br><br>");
out.println("<input type='submit'/>");
out.println("</form></body>");
out.println("</html>");
out.close();
}
}
|
The important part of Listing 2 occurs in the init() method, which sets up the template for use in the other servlet methods. First, it creates a new import com.docuverse.dom.DOM object, which is the basis for DOM operations with the Docuverse toolkit. The next thing to do is to setup a SAX driver for use in parsing. This will assist in taking a standard HTML document (which may not be well formed), and convert it to a well-formed XML-compliant document, which just happens to be in an HTML format. Once that's in place, you'll need to let the DOM toolkit know to use HTML classes for building the DOM tree rather than the standard core DOM classes. This is done through supplying an instance of the com.docuverse.dom.html.HTMLFactory class to the DOM.setFactory() method. Once all this setup is taken care of, you're ready to parse the document, which is done with the DOM.readDocument(). The result is what we're trying to get at, an instance of the org.w3c.dom.html.HTMLDocument
interface (shew ... that's a mouthful).
Actually, all of the DOM HTML classes are in the org.w3c.dom.html package, and they all extend the core DOM interfaces like org.w3c.dom.Node, org.w3c.dom.Document, and org.w3c.dom.Element. I'm not going to explain the individual classes because they are all very simple to use. Check the Javadocs for these classes out, and you'll quickly get a grasp of what's going on. In any case, once you get your HTMLDocument, you're ready to modify some content.
Once you've gotten your HTML DOM structure loaded into memory, actually working with it becomes a piece of cake. This isn't because the methods are simply named or because the documentation is wonderful, but because there isn't anything new to learn. Let me say that again: there isn't anything new to learn! Because the HTML DOM interfaces extend the core DOM interfaces you already know and love (org.w3c.dom.Node, org.w3c.dom.Element, and so on), you can use the methods you've grown accustomed to over the years to work with your HTML structure. For example, you could create a new DIV element, set the alignment for it, and then add it to the BODY element of your document. While the element creation is specific to the DOM HTML module (much like creating a new HTML document, shown in the last section), adding the element into the tree is done through run-of-the-mill core DOM methods. Listing 3 shows this -- within the doPost() method -- in action.
Listing 3. Filling in the data
public void doPost(HttpServletRequest req, HttpServletResponse res)
throws IOException, ServletException {
// Get dynamic data and do something with it
String htmlTitle = req.getParameterValues("htmlTitle")[0];
String pageTitle = req.getParameterValues("pageTitle")[0];
String pageSubtitle = req.getParameterValues("pageSubtitle")[0];
String pageContent = req.getParameterValues("pageContent")[0];
// Navigate and set data
doc.setTitle(htmlTitle);
NodeList tableCells = doc.getElementsByTagName("td");
// 2nd TD is the title
Node td = tableCells.item(1);
Node fontFace = td.getFirstChild();
fontFace.getFirstChild().setNodeValue(pageTitle);
// 5th TD is subtitle
td = tableCells.item(4);
Node center = td.getFirstChild();
Node b = center.getFirstChild();
b.getFirstChild().setNodeValue(pageSubtitle);
// Text is the 6th TD
td = tableCells.item(5);
Node p = td.getFirstChild();
p.getFirstChild().setNodeValue(pageContent);
// Write out HTML
dom.writeDocument(doc, res.getWriter());
}
|
For the sake of the example, I made pretty short work of getting the data from the user (which would probably be more specific to your business needs in a real application). Then I navigated through the HTML document using standard DOM methods, with a few DOM HTML specifics thrown in (like the setTitle() method, for example). Take careful note that the code that I wrote is very dependent upon the specifics of my template file. This is OK, because I control the template file. In other words, it's a great solution for a specific template, and it turns out to be a slick way of handling dynamic data. It's not as generic as using the core DOM, but it also solves your problem a lot better than the core DOM would.
You should also keep in mind that this solution works no matter how complex your HTML template is. You'll just need to spend a little time learning the HTML structure and writing DOM code to traverse it. Of course, there are some even better ways to handle this problem, and I'll mention them briefly in the article's conclusion. However, I wanted to make one final point before letting you go off and conquer the world.
Currently, as I mentioned above, the Docuverse DOM SDK only supports DOM Level 1. However, expect it to move to DOM Level 2 before much longer; this opens up some more possibilities. Specifically, you'll start to see methods that allow you to look for elements based on their ID (you can also expect XPointer and XLink processors to be useful here). At that point, using an ID tag on dynamic data can make finding that element a piece of cake. And that, of course, means simpler code, easier maintenance, and faster revisions ... all very good things. So keep an eye out as DOM and the Docuverse SDK mature -- and as other processors add support for DOM HTML.
Well, there you have it. If you've been working through these
DOM module articles one by one, you are starting to see your toolkit of
XML tricks begin to grow. With the DOM Traversal module, you can easily
move throughout a DOM tree. Now you ought to be able to use DOM HTML
in your servlets and other Web applications to output HTML easily, without
all of those messy out.println() statements in
your code.
In addition, you can begin to think about ways of using these modules with your other tools, creating even more interesting possibilities. For example, with DOM HTML you could create an HTML document and let the user interact with it. You can even take existing HTML forms and pages and parse through them, using DOM Traversal to isolate parts of the pages to work with (much easier than staring at a text editor, right?). Once you've got that down, look at using XML transformations, TrAX (the Transformations for XML API (see Resources), along with DOM HTML and Traversal. In the next article on the DOM modules subject, I'll add the DOM Range module. If you want to know how that helps out, tune in next time. Until then, enjoy, and I'll see you online.
- Get grounded in DOM basics by taking the free developerWorks Understanding DOM tutorial.
- If you need more background on developing XML, try the developerWorks XML programming in Java tutorial.
- You can begin deep study at the W3C's DOM Activity Page.
- Go directly to the current DOM specification, DOM Level 2.
- Read up on the DOM HTML module specification.
- Learn more about XML transformations and TrAX in the author's article on JAXP version 1.1.
- Download the Jakarta Tomcat servlet engine for running the example code if you don't already have a servlet engine installed.
- Get more on Java and XML than you can shake a stick at in Brett's new second edition of Java and XML
.
- IBM's WebSphere Application Server supports DOM processing through the built-in XML4J parser, which is based upon the Apache Xerces parser. Explore the nature of DOM processing support in WAS 3.0 Advanced Edition's detailed online documentation. Look for the DOM section of the What Is tab, under XML and Developing Applications.
Brett McLaughlin (brett@newInstance.com) works as Enhydra strategist at Lutris Technologies and specializes in distributed systems architecture. He is author of Java and XML (O'Reilly). He is involved in technologies such as Java servlets, Enterprise JavaBeans technology, XML, and business-to-business applications. Along with Jason Hunter, he founded the JDOM project, which provides a simple API for manipulating XML from Java applications. He is also an active developer on the Apache Cocoon project and the EJBoss EJB server as well as a co-founder of the Apache Turbine project.




