Through this column's discussion forum, various mailing lists, and my consulting activity, I have noticed a growing interest in managing and publishing Web sites with XML and XSL. Even though many developers are familiar with XML and XSL, building a coherent system is no small task. In this article, I walk you through a practical, step-by-step example of how to create Web sites in XML.
I will illustrate the technique with a tool developed through this column, the XM plug-in for Eclipse (see Resources). The information provided here is useful even if you use another publishing environment, such as Apache Cocoon, but I find XM to be more user-friendly.
First, I'll take a look at the benefits and costs of publishing with XML. You'll find more than one reason to turn to XML -- so many reasons, in fact, that I could not cover them all in this article. I only highlight the most frequently heard motivations:
- It's simpler. You might not think so when you get started, since you need to learn so many new tools, but once you have an XML solution in place your site management chores will be dramatically reduced.
- It's obsolescence-proof. XML separates the content (text and images) from the styling and the publishing, so you can change one independently from the other. For example, when you write new documents you concentrate on the writing and not on the colors, background, or navigation. Conversely, when you change the colors, background, or navigation, XML and XSL automatically update all your pages.
- It's an open standard. XML is supported by many commercial and open-source tools. Even if a vendor disappears, discontinues a product, or doesn't support the features you need, you can be confident that there's a replacement.
- It's easily adaptable. XML documents are like tiny databases, and the stylesheets are like scripts that query and manipulate the data from those document databases. These stylesheets are incredibly flexible -- from straightforward publishing to computing new content such as tables of contents, indices, and more.
But what about the cost? You have to balance the costs of writing the stylesheets against their benefits. It pays to automate repetitive tasks, but don't overdo it. If your site contains only a handful of pages, it is faster and cheaper to forgo XML. When the site reaches 10 to 20 pages, XML starts to pay for itself.
Personally, I like XML because it simplifies site management. Several years ago, I had to maintain a site that contained over 100 pages with a regular HTML editor. Believe me, that was no fun. Any change to the site, such as adding or removing sections, would take hours of copying and pasting links. Mistakes and broken links were frequent.
Not so with XML and XSL. Instead, stylesheets automate the boring, repetitive tasks, saving time and minimizing errors. Of course, XML is not the only solution. Some editors offer a template-based approach that is like a combination of XML and XSL. Still, I prefer XSL because it's a scripting language (limited only by my imagination) and it is not tied to proprietary solutions.
You can use stylesheets on the server, client, or webmaster desktop. The XM plug-in for Eclipse implements the webmaster desktop -- and when in batch mode, it also works on servers. The plug-in automatically creates a static Web site (such as a bunch of HTML pages) that are ready to upload to any server. By using stylesheets on the webmaster desktop, you can further increase XML's flexibility because -- unlike the alternatives -- it is compatible with every Web server and browser.
What about servlets, JSP, PHP, or ASP? In other words, what about dynamically generated Web sites? Many shops have turned to dynamic hosting to gain the same benefits and simplify site maintenance. The code in the servlet or JSP page takes care of the presentation. How does this compare to XML and XSL?
In a nutshell, XML is more efficient. Dynamic sites tend to be slower because the server computes the page for every request. Those sites are also more difficult to set up and maintain, which unfortunately often translates into less stable sites. I know there are ways around all these problems, but you will find that XML delivers better results at a fraction of the cost.
To get started, download Eclipse and the XM plug-in for Eclipse (see Resources for links). XM is a project of the Working XML column that enhances Eclipse to support Web publishing with XML and XSL. XM is also available as standalone software that is ideal for batch processing. To prepare this column, I used Eclipse 2.1 and XM 0.9. Follow the instructions on the Eclipse and XM Web sites to install the software.
Launch Eclipse, then click Project from the File > New menu. In the dialog box that opens
(see Figure 1), select ananas.org and XM Project, then click Next. Enter a project
name, such as
mysite, then click Finish.
Figure 1. Creating a new project
The new project appears in the navigator. When you open the project,
you see that it contains three directories:
src, as shown in Figure 2. If you
don't see the navigator, click Navigator from the Window > Show View menu.
Figure 2. The new project in the navigator
src (source) directory holds your XML documents as well as your images and other support files. The plug-in creates a sample file to get you started. You should edit it to insert your own content and add as many other XML files as needed. Every XML file in the
src directory becomes an HTML page on the Web site.
The XML editors section introduces the tools to write XML documents. For the time being, just open the XML document in a text editor, such as Eclipse. The sample document uses a simplified version of DocBook with the following tags:
article: The root of the document
articleinfo: Contains bibliographical information
sect1: A document section
sect1info: Contains the section title
title: May appear under
sect1infoas a title
copyright: Holds the copyright information as one or more
yeartags and one
simpara: A paragraph
ulink: A hyperlink
You can use other tags, but you need to edit the stylesheet accordingly.
As I mentioned, the sample document is derived from DocBook. However, it uses a different namespace to indicate it's not the real thing. DocBook is a standard vocabulary for technical documentation. It was originally developed by O'Reilly and it is maintained by OASIS, an international association of XML users.
You might find DocBook is a good choice to get started because it's available, it works, it's a standard, and it's popular (mostly because it's popular). Hundreds of existing XML tools work with DocBook -- obviously, more tools on the market means less work for you.
Other popular XML vocabularies for Web sites include NewsML from the International Press and Telecommunication Council (IPTC), the Web page DTD from Norman Walsh (Norman Walsh also maintains the DocBook vocabulary), and the Apache Cocoon DTD.
rules directory contains the stylesheets. Most Web sites need only one stylesheet. The XM plug-in for Eclipse applies the default.xsl stylesheet to every document, unless it is told otherwise. Consequently, if your site has only one stylesheet, save it as
rules/default.xsl. If your site needs more stylesheets, save them under
rules and add the following processing instructions to those documents that do not use the default:
<?xml-stylesheet href="listing.xsl" type="text/xsl"?>
Beware! The processing instruction needs both parameters:
href points to the stylesheet (you can just enter the file name -- the XM plug-in automatically looks under the
rules directory), and
type must have the
text/xsl value. Also remember that the processing instruction applies to documents
src directory), not to stylesheets (in the
Last but not least is the
publish directory, where the plug-in generates your Web site. Your next step is to upload the content of this directory to your Web server.
Another warning: You should never try to edit or modify the files in the
publish directory. If you're not happy with a Web page, change the XML document (in the
src directory) or the stylesheet (in the
rules directory), but never try to edit anything in the
publish directory. Your goal is to automate publishing chores -- editing the site directly defeats that goal. Furthermore, the plug-in may overwrite your changes the next time it regenerates the site.
As you have seen in the previous section, the project wizard creates a sample site. The next step is to populate the
src directory and enhance the stylesheets. If you adopt a popular vocabulary, such as DocBook, you can find pre-existing stylesheets that should speed up the process.
Since you have to edit many XML documents, it pays to invest in a good XML editor. Your options are:
- A text editor, such as Eclipse. Text editors are appropriate for small corrections, but they are too cumbersome for serious editing.
- A pseudo-WYSIWYG XML editor such as XMetaL or XMLMind. These editors emulate a word processor and are ideal for serious editing.
- An RTF converter. These work with your word processor to generate XML and are perfect when you collect documents from many different authors who may not be familiar with XML.
Which option is best depends on the job at hand. I find it nearly impossible to write long documents with a text editor. Having to remember to balance open and close tags is a huge drag on my productivity. Most authors are uncomfortable with text editors for anything but the most basic corrections.
Pseudo-WYSIWYG editors offer the most comfortable environment and the author doesn't need to worry about the XML syntax (see Figure 3). They are called "pseudo-WYSIWYG" because they use color, boldness, and other typographic attributes to emulate a word processor with XML content. If you have never tried a pseudo-WYSIWYG editor, do yourself a favour and download an evaluation version right away. Be warned that the editors don't work right out of the box -- they must be customized for a given vocabulary. Fortunately, most editors ship with native support for DocBook -- another reason to adopt this popular vocabulary.
The last solution is to stick with your word processor and use an RTF converter to generate the XML document. In practice, you might find that the conversions are seldom trouble-free, but it's a good solution if you collect documents from authors who are not familiar with XML. At Pineapplesoft, we maintain community Web sites where many authors contribute to the sites, and we use converters extensively.
Figure 3. A pseudo-WYSIWYG editor
As a bonus, the XM plug-in for Eclipse manages hyperlinks to prevent broken links. The plug-in works with so-called relative URLs (URLs that give the path relative to the current file). Listing 1 shows a relative URL example.
Listing 1. Relative URL
about.xml photos/index.xml ../images/logo.gif
Absolute URLs, on the other end, either include a host name or give the path from the root of the Web site. Listing 2 shows an absolute URL example.
Listing 2. Absolute URL
/photos/index.xml http://www.ananas.org/ http://www.ibm.com/developerWorks
You should use relative hyperlinks as much as possible because the XM plug-in:
- Updates the file extension if needed, changing from
- Tests the link and issue a warning if it's broken
The plug-in reports problems with your XML documents or your stylesheets in the XM console. If you don't see the console, select XM Console from the Window > Show View menu. Read the error message carefully because it includes a description of the problem. The plug-in also lists the file and line where the error occurred (though it might be off by a line or two, so make sure to review the lines before and after the problem as well).
If the plug-in generates blank Web pages:
- Read error messages in the XM console carefully
- Make sure that the corresponding XML document is not empty
- Check that your stylesheet is appropriate for the document vocabulary, paying special attention to namespace
When something looks really weird, double-check the namespaces and the element names. Namespace mismatches account for 25 percent of all my students' problems.
I conclude with a few tips on the XM plug-in.
Select Preferences from the Window menu. Under the Workbench category, choose the File Associations entry and associate an editor with
*.xsl files. You can associate one of the many Eclipse text editors or use an external editor (such as XMLMind). When you double-click the file, it automatically opens the editor.
To choose the editor when you open the file, right-click it in the navigator and choose the Open with menu.
Eclipse automatically generates the Web site when you save a document from within Eclipse. If you are using an external editor, such as XMLMind, select Rebuild Project from the Project menu. If the menu is grayed, click on the project in the navigator first.
The XM plug-in does not recognize changes to the stylesheets when it rebuilds the Web site. If it looks like the plug-in is ignoring your changes, follow these steps:
- Right-click on the project name in the navigator (
mysitein the above example), then choose Properties.
- Select XM Properties and make sure that "Run XM" performs a build is checked (see Figure 4).
- Click OK.
- Right-click on the project again, then choose "Run XM".
Figure 4. Edit the properties
I hope this article has convinced you that publishing a Web site with XML and XSL is fun and offers many benefits. XSL is a powerful tool -- and the XM plug-in further extends this power, so I could only scratch the surface in this article. To learn more about all the features available, I suggest you read the earlier articles in the Working XML column on developerWorks.
When you download the plug-in, you will find a copy of the
ananas.org project. That's the project I use to maintain the site and it demonstrates many advanced features. You might want to study this code as well. Finally, make sure you join the Working XML discussion forum.
- Participate in the discussion forum.
- Check out some of the other installments in the "Working XML" column. Upcoming articles will cover more advanced options on Web publishing with XML and XSL.
- Download Eclipse and the XM plug-in used in this article.
- Explore these popular XML vocabularies: DocBook,
NewsML, Norman Walsh's Web DTD, and the Apache Cocoon DTD.
- If you edit XML documents regularly, invest in a
pseudo-WYSIWYG editor. If you have always edited XML documents with a
text editor, do yourself a favour and download an evaluation version
right away. Some of the most popular editors include XMetaL from Corel, the XMLMind Editor (available
on many platforms), and x4o from i4i (a hybrid product that works with Word).
- If you work with authors who are not familiar with XML, you will find
an RTF-to-XML converter saves time. Two popular products include UpCast from Infinity Loop and the Logictran RTF Converter.
- For more on XSL stylesheets, try this developerWorks tutorial: "Introduction to XSLT: Transform XML data from one format to another with Extensible Stylesheet Language Transformations (XSLT)" (Nicholas Chase, January 2007).
- Find help to deploy XML and XSL on your Web sites with XML by Example (Que, September 2001) by Benoit Marchal and XSLT Quickly by Bob Ducharme (Manning Publications Company, May 2001).
- Find more XML resources on the developerWorks XML zone.
- IBM trial software for
product evaluation: Build your next project with trial software available for download directly from
developerWorks, including application development tools and middleware products from DB2®, Lotus®,
Rational®, Tivoli®, and WebSphere®.
- Find out how you can become an IBM Certified Developer in XML and related technologies.