Skip to main content

Working XML: Using XSLT for content management

Introducing XM, a poor man's content manager

Benoit Marchal (bmarchal@pineapplesoft.com), Consultant, Pineapplesoft
Benoît Marchal is a consultant and writer based in Namur, Belgium. He is the author of XML by Example , Applied XML Solutions and XML and the Enterprise. He is a columnist for Gamelan. Details on his latest projects are at www.marchal.com. He can be reached at bmarchal@pineapplesoft.com.

Summary:  This is the first installment of Working XML, a column with companion project code that demonstrates the evolution of full-fledged XML applications. In this column, author and software consultant Benoît Marchal introduces XM (XSLT Make), a simple and affordable Web publishing content-management solution that takes advantage of XML and XSLT. Code samples show the development of a wrapper for the XSLT to make it easy for a nonprogrammer to use. XM project code is available by link.

View more content in this series

Date:  01 Jul 2001
Level:  Introductory
Activity:  5291 views

Welcome to the first installment of Working XML, a new column on developerWorks. The premise behind this column is that developers learn best by studying code, so along with the column I'll be developing a series of XML projects, which I'll discuss over several columns. Thanks to this format, I can tackle larger, more realistic projects than what is typically possible in a scenario dreamed up just for an article. Note that you'll be able to find the demonstration projects themselves as open-source projects on the companion site to the column (see Resources). I expect that the projects will evolve as you and I use them, and I'll report back on the changes here.

Another interesting feature of this set up is that you can follow the projects from their infancy to maturity. I have found that one learns as much, if not more, from errors as from working solutions. Because Working XML follows the development of actual projects, it will give me more opportunities to warn you against dead ends than I'd have in a one-shot article with only a hypothetical situation. I hope this combination of ongoing open-source projects and a column will make us better developers.

XM: A poor man's content manager

The first project I'll work through is an affordable Web publishing solution. The inspiration for this project comes from my struggle to manage a Web site with more than 200 pages. I have found that there are excellent tools to manage small or large sites, but I could not find an appropriate solution in between.

If yours is a 10- to 20-page site, HTML editors like HoTMetaL PRO, Dreamweaver, or FrontPage (see Resources) are perfect. However, those tools prove less useful as the site grows and the work shifts from creation to the comparitively more expensive maintenance. Many HTML editors are suboptimal for maintaining a growing site. For example, they may force you to edit manually all pages just to redesign the navigation -- a mere annoyance if you manage fewer than two dozen pages, but too much work for more extensive sites.

At the other end of the spectrum are high-end publishing solutions, such as OpenMarket, Vignette or eContent (see Resources). They excel at managing large sites with a huge content base, but (there is always a but) the entry ticket is high, so high that most overloaded webmasters will probably only dream of having one of these systems.

So what's in the middle? Homegrown solutions. Many webmasters have turned a combination of scripts (JSP, ASP, or PHP) and a database to help them cope with an ever-growing site. This approach works, but it's not without faults. For one thing, it puts a toll on the server, so the pages may load more slowly. Also script-based Web sites are more prone to bugs or even crashes (of course, I speak for myself; bugs do not afflict your code). Finally, search engines are less likely to index dynamically generated sites. Overall I have found that, while scripts and databases may make life easier for the webmaster, they are far from optimal for the visitor.

As befits a column on XML, I propose an alternative built on XML and XSLT. Indeed it's easy to prepare documents in DocBook or another XML vocabulary and convert them automatically to HTML. Automatically is the operative word here. The goal is to cut on manual processing and automate as much of the site maintenance as possible. I like to think of it as moving from small-scale to industrial-scale webmastering.

That's the theory, at least. In practice, to try this at home you'd better be a programmer. XSLT processors, such as Xalan, are not exactly friendly to use, yet. XML publishing frameworks exist, such as Cocoon (see Resources) but, again, they are geared towards developers. My goal with XM (XSLT Make) is to wrap a friendly face around an XSLT processor. Ultimately, I hope that XM can be accessible to nondevelopers who manage midsized Web sites.

I selected this project as the first project for Working XML because, judging from readers' mail on a recent developerWorks article, Managing e-zines with JavaMail and XSLT, there is no shortage of interest in friendly applications of XSLT for publishing.


Road map

Figure 1 summarizes how XM will work. There are three stages in preparing and publishing a Web site:

  1. Authoring: writing the content with an XML editor or acquiring it from a non-XML source, such as a word processor
  2. Publishing: converting the content in HTML
  3. Enjoying: viewing the content in a browser

Figure 1: Three-step Web publishing with XM
Authoring, publishing and enjoying

I draw your attention to the fact that XM generates mostly static HTML pages. The goal is to help the webmaster better manage the Web site -- without sacrificing raw performance. To enhance performance, most dynamically generated Web sites use some sort of caching. I believe that static HTML pages are, by far, the best caching strategy.

As you can see, this project has quite a charter, and I don't plan to achieve all of it in one column (or even two). Projects for Working XML will mature over several months, like any other serious development. Still, to conclude this section, I want to point out a few challenges and milestones I have already identified. Obviously, I expect to add more as development proceeds:

  • Devise a simple-to-use wrapping for an XSLT processor (covered in this column).
  • Automatically upload files to the Web server via FTP.
  • Manage files and links between those files (one solution might be to let a style sheet read a directory).
  • Support multiple publishing formats such as PDF (Portable Document Format supported by Adobe Acrobat), SVG (Scalable Vector Graphics for images), RSS (Rich Site Summary for Web portals), and more.
  • Host XM on the Web server itself to enable collaborative editing.

Content management by example

One of the main challenges in the first versions of XM is to make it approachable by nonprogrammers. I do not merely mean a nice user interface with friendly buttons. I believe that a lot of nice buttons can't make an unfriendly solution appear friendly; it remains an unfriendly application. Rather I have tried to devise a operative mode that is easy to understand.

My first idea was to build upon the configuration file in Managing e-zines with JavaMail and XSLT. If you missed that article, the configuration file controls which style sheets are applied and what the result should be. See Listing 1 for an example. I tested many variations, yet no matter what I tried, I could not come up with something friendly enough.


Listing 1. rules.xml: one of many attempts at creating a friendly configuration file
<?xml version="1.0"?>
<rules version="1.0" xmlns="http://www.ananas.org/2001/xm/rules">
   <!-- apply the style sheet on XML files -->
   <rule extension="xml">
      <apply-stylesheet file="default.xsl"/>
      <copy from="$xslt" to="$target"/>
   </rule>
   <!-- copy GIF files -->
   <rule extension="gif">
      <copy from="$source" to="$target"/>
   </rule>
   <!-- create an XML file with the content of
        freebies/download, style it -->
   <rule directory="freebies/download">
      <ls dir="$current"/>
      <apply-stylesheet file="download.xsl"/>
      <copy from="$xslt" to="$target"/>
   </rule>
</rules>

And then it hit me, such a configuration file with rules and variables (such as $source) is inherently hard to use. It is similar to a scripting language, and we all know scripting languages are not for novices. I hit upon a more accessible solution: Let the user create a directory that mimics the final Web site and automatically produces the output. I call this content management by example because the user really creates an example of how the Web site will be organized, and XM needs no more configuration to turn it into the real Web site.


Skeleton classes

If you plan to follow along, go to ananas.org and download the project code; you'll need it before you read much farther. The code I cover this month is no more than a skeleton for XM. It does only the most basic thing: It recursively walks through a source directory, applying a style sheet on every XML file along the way to produce the Web site in HTML.

The classes so far are as follows:

  • NotImplementedException is the only class I'm constantly missing from the Java standard library.
    It's not uncommon during development to attempt to call code that has not been implemented yet or to hit unexpected (and therefore not implemented) situations. I have found that throwing a clear indication that I have not written the code yet consistently saves me hours of debugging efforts.
  • XMException XM throws this exception when it encounters an error. This exception can embed other exceptions, such as TransformerException or SAXException.
  • Resources is just a convenience. It gives me fast access to resources in the default locale.
  • Messenger is an interface to abstract error and information messages. The current version of XM runs from the command line, but I expect to build a GUI or a Web-based interface. By funneling all messages through Messenger, it will be easier to support different interfaces.
  • DefaultMessenger is a default implementation of Messenger that prints to a Writer.
  • DirectoryWalker does the actual processing. It recursively walks through a set of directories, applying the style sheet as it goes along.
  • Console is the entry point for XM. As the name implies, it works from the command-line.

Messenger

If you forget exceptions, the two most useful classes in XM so far are Messenger and DirectoryWalker. Messenger in Listing 2 defines the interface to communicate with the user. This class is modelled after TrAX's ErrorListener or SAX's ErrorHandler:

  • error(), fatal(), and warning() are used to report errors.
  • progress() can be used to implement a progress bar or another mechanism to report progress to the user.
  • info() reports other information.

Since XM mostly is a noninteractive process, I have defined no methods to prompt the user or otherwise accept input.


Listing 2. Messenger.java: XM's interface to the user
package org.ananas.xm;
import java.io.File;
public interface Messager
{
   public void error(XMException e)
      throws XMException;
   public void fatal(XMException e)
      throws XMException;
   public void warning(XMException e)
      throws XMException;
   public void progress(File sourceFile,File resultFile)
      throws XMException;
   public void info(String msg)
      throws XMException;
   public void info(String pattern,Object[] arguments)
      throws XMException;
}


DirectoryWalker

DirectoryWalker in Listing 3 essentially copies XML files from the source directory (including files in its subdirectories) to the target directory. There's only one twist: It applies a style sheet to the XML files before copying them.


Listing 3. DirectoryWalker.java: where the fun is
package org.ananas.xm;
import java.io.*;
import java.util.*;
import javax.xml.transform.*;
import javax.xml.transform.stream.*;
import org.ananas.util.NotImplementedException;
public class DirectoryWalker
{
   protected Templates templates;
   protected long lastModified;
   protected String extension;
   protected Messager messager = null;
   public DirectoryWalker(File stylesheetFile,Messager messager)
      throws TransformerConfigurationException
   {
      lastModified = stylesheetFile.lastModified();
      templates = TransformerFactory.newInstance().
                     newTemplates(new StreamSource(stylesheetFile));
      extension = '.' + templates.getOutputProperties().
                           getProperty(OutputKeys.METHOD);
      this.messager = messager;
   }
   public void walk(String source,String target)
      throws IOException, XMException
   {
      walk(new File(source),new File(target));
   }
   protected void walk(File source,File target)
      throws IOException, XMException
   {
      if(source.isDirectory())
      {
         File[] files = source.listFiles(),
                dirs = new File[files.length],
                docs = new File[files.length];
         int idirs = 0,
             idocs = 0;
         for(int i = 0;i < files.length;i++)
         {
            if(files[i].isDirectory())
               dirs[idirs++] = files[i];
            else if(files[i].isFile())
               docs[idocs++] = files[i];
            else
               throw new NotImplementedException("Expecting file or directory");
         }
         if(!(target.exists() && target.isDirectory()))
            if(!target.mkdirs())
               messager.fatal(new XMException(
                  Resources.getString("cannotcreatedirectory"),
                  new Object[] {target.getAbsolutePath()}));
         for(int i = 0;i < idocs;i++)
         {
            String fname = docs[i].getName();
            int pos = fname.lastIndexOf('.');
            String extension = pos != -1 ? fname.substring(pos + 1) : "";
            if(extension.equals("xml"))
            {
               File result = style(docs[i],target);
               messager.progress(docs[i],result);
            }
            // else copy file
         }
         for(int i = 0;i < idirs;i++)
            walk(dirs[i],new File(target,dirs[i].getName()));
      }
      else
         messager.fatal(new XMException(
            Resources.getString("notdirectory"),
            new Object[] {source.getAbsolutePath()}));
   }
   protected File style(File sourceFile,File targetDir)
      throws XMException
   {
      try
      {
         String resultName = sourceFile.getName();
         int pos = resultName.lastIndexOf('.');
         if(pos != -1)
            resultName = resultName.substring(0,pos);
         resultName += extension;
         File resultFile = new File(targetDir,resultName);
         if(resultFile.exists())
         {
            if(resultFile.lastModified() >= sourceFile.lastModified() &&
               resultFile.lastModified() >= lastModified)
               return null;
         }
         Transformer transformer = templates.newTransformer();
         transformer.transform(new StreamSource(sourceFile),
                               new StreamResult(resultFile));
         return resultFile;
      }
      catch(TransformerException e)
      {
         throw new XMException(e);
      }
   }
}

walk() walks through a directory, applying the style sheet. It starts by listing all the directory content and separating it into files and subdirectories. Next it styles every file and recursively calls itself for subdirectories.

Applying the style sheet is the responsibility of the style() method. For portability, DirectoryWalker uses TrAX (Transformation API for XML) to interface with the XSLT processor. So far, I have tested the code with Xalan (see Resources).


Till next time ...

You can download and exercise the XM code already. Your entry point is the org.ananas.xm.Console class. It takes only two parameters: the source directory, where the XML files reside; and the target directory which will be created by XM. Make sure your style sheet is called rules.xsl and that it resides in the current directory.

I grant you the current version does not do much and you can use it only to publish the simplest Web sites. The next column will make it more interesting since I plan to add a mechanism to give style sheets access to the file system.


Resources

  • You can download all the code for this project from ananas.org and subscribe to a mailing list for news and announcements on this project.

  • XM uses Xalan and Xerces-J respectively as XML parser and XSLT processor.

  • Webmasters with small sites are well served by HTML editors such as Corel XMetaL (formerly SoftQuad HoTMetaL PRO), Macromedia Dreamweaver, or Microsoft FrontPage.

  • Large sites should turn to high-end content management solutions such as OpenMarket (now part of FatWire), Vignette or eContent. Note that OpenMarket is an IBM partner and offers its content management solution on WebSphere.

  • Managing e-zines with JavaMail and XSLT (parts 1 and 2) is another developerWorks article on using XML and XSLT for publishing.

  • Cocoon is an open-source publishing framework; it relies extensively on dynamic code generation.

  • Find more XML resources on the developerWorks XML zone.

About the author

Benoit Marchal

Benoît Marchal is a consultant and writer based in Namur, Belgium. He is the author of XML by Example , Applied XML Solutions and XML and the Enterprise. He is a columnist for Gamelan. Details on his latest projects are at www.marchal.com. He can be reached at bmarchal@pineapplesoft.com.

Comments (Undergoing maintenance)



Trademarks  |  My developerWorks terms and conditions

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=XML
ArticleID=12016
ArticleTitle=Working XML: Using XSLT for content management
publish-date=07012001
author1-email=bmarchal@pineapplesoft.com
author1-email-cc=dwxed@us.ibm.com

My developerWorks community

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere).

My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Rate a product. Write a review.

Special offers