Skip to main content

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

All information submitted is secure.

  • Close [x]

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

  • Close [x]

Working XML: Wrapping up XM version 1

Managing a list of links and the table of contents

Benoit Marchal (bmarchal@pineapplesoft.com), Consultant, Pineapple Software
Benoît Marchal is a consultant and writer based in Namur, Belgium. He has just released the second edition of XML by Example . He is also the author of Applied XML Solutions and XML and the Enterprise. Details on his latest projects are at marchal.com. You can contact Benoît at bmarchal@pineapplesoft.com.

Summary:  In this month's column, developer and author Benoît Marchal adds final features to the first release of XM, a low-cost open-source content management solution based on XSLT (in Java). New features in this revision manage download pages and tables of contents via a directory reader that makes use of SAX and XMLFilter.

View more content in this series

Date:  01 Oct 2001
Level:  Introductory
Also available in:   Japanese

Activity:  6971 views
Comments:  

For the last three months XM has slowly taken shape. My goal with XM was to develop a low-cost solution for Web publishing using XML and XSLT. Many webmasters shy away from XML and XSLT because they look too complex. With XM I'm hoping to make these technologies more approachable.

The current version boasts a handy mechanism that applies style sheets only to those files that have changed, and link management that maintains and validates the site links. The latest XM also supports multiple style sheets, and it is easy to pass them parameters. Even with this small set of features, XM has proved effective in limited testing for publishing actual Web sites. I have been maintaining three different Web sites with the help of XM, and I have found that it performs adequately.

Although I can think of many more improvements for XM, this month I add only one significant new feature. Next month I'll move another project to the foreground in the Working XML column, and I'll focus on that for the next few months. No, I am not bored with this project, but I need more feedback on what works and what needs to be improved before I can refine it. I plan to field test XM further, and I hope to collect feedback from readers (that's you) as well. I won't stand idle though, and I will launch a new project -- a SAX handler compiler -- in the next column.

This month's improvement to XM is the ability to compile file lists automatically. This will be handy for maintaining download pages and tables of contents.

On prototyping

Why go to another project now? There are many reasons. Of course, there's the schedule I had worked out with the editor when I proposed a column that would document the process of creating a series of open-source XML applications. More seriously, though, I need to collect feedback on XM.

I have approached XM essentially as user interface development. Although it's not a GUI, my original goal was to take an XSLT processor, which is not exactly a user-friendly tool, and make it more approachable to webmasters by wrapping it with XM. In essence, XM is one possible user interface over TrAX.

User interface development involves a lot of prototyping and trial and error. Nothing beats user feedback. For example, I originally thought that XM would support only one style sheet. While theoretically sound (it is possible to process many XML vocabularies with a single style sheet), experience showed that this approach is not as convenient as I had expected. So I adopted another strategy, with which I'm very pleased. Some readers have expressed surprise that I did not spend more time designing XM in advance; I hope this explanation clarifies why I plunged into the prototyping with this first project.

So far I have tested XM with two Web sites, ananas.org and another site that is still under development. The experience is useful, but there's only so much I can learn from those sites. I need more feedback to decide on future development. I'll use XM with other Web sites that Pineapplesoft, my company, is involved with, but I also want to give you a chance to try the software and report on your findings. I know you cannot test a moving target, so it makes sense to break for a few months and listen.

So please try XM with a Web site of your own. If you do not currently have a Web site, you can sign up for free space with a service such as Geocities and start a personal page or build a site for a local association.


Using XM

This section summarizes how to install and test XM (for more details on a given feature turn to Working XML: Using XSLT for content management, the column where it was introduced). Remember that if you encounter problems, you should report them on the ananas-discussion mailing list (see Resources).

You should first download the latest version of XM. As usual, you can access the latest version through the CVS repository. However, you may find that the packaged version, which includes Windows batch files and compiled classes, is more convenient. Download and unzip the packaged version. See Resources for all the links.

java org.ananas.xm.Console src publish


Directory reading

Last month I hinted at the directory-reading feature that I detail in this column. This is helpful for index pages -- those pages that consist mostly of links to files, such as download sections -- or other pages such as a table of contents or a glossary. As any webmaster knows, maintaining long list of links by hand is particularly tedious.

For a download section, the webmaster must not only select and prepare the files but must also create and maintain one or more pages with links to the actual files. It's easy to forget to update the new link when a file is being made available or when an older file is deleted. Likewise, when a document is spread across several pages, the webmaster has to create a table of contents with links to all the pages in the document. Again, maintaining a table of contents page is error prone, and it's not uncommon that sections are reported missing. A glossary causes essentially the same problems.

Clearly, human editors and webmasters are not so good at maintaining long lists of data. That boring work is best left to computers and XM's directory reading. Directory reading causes XM to create an XML document with the list of files and then use an XSLT style sheet to turn it in the proper Web page. The XML document might look like Listing 1. The main benefit is that XM reads the directory from the file system each time it runs, thereby guaranteeing that the link page is always up to date.

Incidentally, I find this an interesting middle ground between dynamic Web sites (using JSP or similar technologies) and purely static sites where the webmaster has to manually update every document. XM builds static sites but uses software to automate the creation and maintenance of the most boring pages.

DirectoryReader

The directory reading itself is the responsibility of the DirectoryReader class which, alongside with all the other listings in this column, is available online (see Resources). For increased efficiency, DirectoryReader does not write an XML document to the disk, instead it fires SAX events that describe the document.

This is a small optimization that saves creating and parsing a temporary file, without much special effort on my part. The more common setup would have the DirectoryReader save the XML document in a temporary file. The document is then fed to the XSLT processor, through a parser. This is illustrated in Figure 1.


Figure 1. DirectoryReader produces an XML document
Figure 1. DirectoryReader produces an XML document

However, when coding DirectoryReader, I would probably have methods to write start tags, end tags, and attributes. So it is not much more effort to call directory into a ContentHandler's startElement(), endElement(), startPrefixMapping(), and endPrefixMapping(). You saw how they work in the description of writing the LinkFilter class in Working XML: Processing instructions and parameters.

With this solution, DirectoryReader simulates the SAX parser. The file is never written to disk. Instead, it is fed straight into the XSLT processor. Because everything takes place in memory, as illustrated in Figure 2, it is marginally more efficient.


Figure 2. DirectoryReader bypasses the parser
Figure 2. DirectoryReader bypasses the parser

From the XSLT processor standpoint, there are no differences between DirectoryReader and a regular SAX parser. From my point of view, the benefits are twofold: It is slightly more efficient and, perhaps more importantly, it is easier to merge the directory document within a real XML document with an XML filter, as you will see in a moment.

As a side note, this illustrates that XML is not only a syntax, but it is also a standard data model. The SAX flux of events properly describes the XML document. Although you never write the file to disk and never bother about the XML syntax, the document is without doubt an XML document because it conforms to the data model.

Listing 2 is DirectoryReader.read(), which creates the document. It uses a standard Java File object to read the directory, and it creates attributes for every property in the File object. ContentHandlerExtractor, introduced in a previous column, was helpful in writing this method.

DirectoryReader is designed to either create a complete XML document or to create a document that is merged into another one. The embedded property controls this behavior, the only difference being whether DirectoryReader issues startDocument() and endDocument() calls.

DirectoryReader adds one attribute to those defined by File: isMarked. isMarked serves to identify the current file. Style sheets may use the isMarked attribute to filter out links to the current file.

WalkFilter

To call on DirectoryReader, the webmaster must create an XML document and insert a special xm:Directory element where he or she needs the file list. As you can see in Listing 3, the home page for ananas.org, an xm:Directory element replaces the list of projects. The list itself is built by the style sheet from the list of subdirectories.


Listing 3. The home page for ananas.org
<?xml version="1.0"?>
<?xm-xsl-param name="sponsor" value="ananas"?>
<article xmlns="http://www.psol.com/2001/docbook" 
         xmlns:xm="http://www.ananas.org/2001/XM/Walk">
<articleinfo>
 <title>ananas.org</title>
 <subtitle>open-source software for the 'Working XML' column</subtitle>
 <author><firstname>Benoît</firstname><surname>Marchal</surname>
<affiliation><orgname>Pineapplesoft sprl</orgname></affiliation></author>
 <copyright>
    <year>2001</year>
    <holder><ulink href="http://www.marchal.com/">Benoît</ulink></holder>
 </copyright>
</articleinfo>
<simpara><citetitle pubwork="journal">ananas.org</citetitle> is the 
companion Web site for the <citetitle pubwork="series">Working XML
</citetitle> column by <author><firstname>Benoît</firstname>
<surname>Marchal</surname></author>. <citetitle pubwork="series">
Working XML</citetitle> is published on <citetitle pubwork="journal">
<ulink href="http://www.ibm.com/developerWorks">developerWorks</ulink>
</citetitle>.</simpara>
<simpara>The source for the <citetitle pubwork="series">Working XML
</citetitle> column is being distributed through this site. Currently 
the following projects are available:</simpara>
<xm:Directory dir="." markSource="true"/>
</article>

WalkFilter intercepts xm:Directory elements and replaces them with a call to DirectoryReader. Like LinkFilter, WalkFilter is implemented as an XMLFilter. However LinkFilter is designed to post-process HTML documents, whereas WalkFilter preprocesses XML files.

WalkFilter is simple. As Listing 4 illustrates, it tests for xm:Directory in startElement(). There's additional logic in WalkFilter to discard the content of the xm:Directory element. For obvious reasons, I do not want that content to go through the filter.

Note that the namespaces for WalkFilter and DirectoryReader are different even though I map both of them to the xm prefix. WalkFilter uses a more generic namespace (http://www.ananas.org/2001/XM/Walk) because in the future I expect to add similar readers for databases, directory services, and e-mail. The two namespaces are handy to distinguish between the two elements in style sheets.

Since the path in xm:Directory is relative to the document's directory, WalkFilter needs the path to the document. SAX offers a standard property mechanism that lets the caller sets the directory. WalkFilter implements the setProperty() and getProperty() methods to that effect.

StylingMover and Co.

The last bit is to use the WalkFilter from a mover. With TRaX, it's not too difficult to preprocess an XML document through a filter. One of the constructors for SAXSource takes an XMLReader (and, therefore an XMLFilter too) as a parameter. I slightly modified StylingMover to use this constructor when an XMLFilter is available. As you can see in Listing 5, the changes are minimal.


Listing 5. StylingMover preprocesses XML files
if(xmlFilter != null)
{
   try
   {
      xmlFilter.setProperty(SOURCE_FILE,sourceFile);
   }
   // it's OKay if it does not recognize the property, it
   // just means it's not one of our filter
   catch(SAXNotRecognizedException e)
      { }
   source = new SAXSource(xmlFilter,JAXPHelper.toInputSource(sourceFile));
}
else
   source = new StreamSource(sourceFile);
transformer.transform(source,result);

There is one serious issue, however. Because WalkFilter modifies the input document, it is no longer possible to rely on the date a file was last modified to decide whether it should be restyled or not. For example, even if Listing 3 does not change, the document should be restyled if the directory has changed.

From Java I cannot detect when a directory was last modified (please post to the list if you have a workaround). The only safe solution is to restyle the document whenever XM runs, thereby defeating the smart build. It might work for small sites but it would be problematic for larger sites. I compromised by using a different extension (I chose .xm) for files with xm:Directory tags.

Listing 6 is an excerpt from MoversSupervisor. As you can see, it registers StylingMover twice. Once for regular XML file and once for XM-specific files. The latter preprocesses through a WalkFilter.

ReferenceResolver

Listing 7 is an excerpt from the ananas.org style sheet. When applied to Listing 3, it creates the list of projects. The style sheet assumes that each project directory includes an index.xml file. It attempts to load the latter file and extracts its title and subtitle to create a hyperlink. This illustrates how to use directory reading for table of contents.

There's one catch though. By default, XSLT processors load documents relative to the style sheet directory. Because XM places style sheets in a completely separate directory, the processor will never find the index.xml documents.

I have introduced a new URL format, x-source: (the x- indicates it's not a standard URL format), that resolves filenames relative to the current document instead of the style sheet. I have amended the ReferenceResolver class to recognize and process those x-source: URLs, as illustrated in Listing 8. You will recall that the processor uses ReferenceResolver to load documents and style sheets.


Your turn

This concludes the first release of XM. Throughout this journey, we have learned about interesting features in the SAX and TrAX APIs. As I have already stated, I encourage you to download the package and test it for yourself.


Resources

  • Participate in the discussion forum.

  • You can download the code for this project from ananas.org. Follow the links to the CVS repository on developerWorks as well as to the ananas-discussion mailing list and the packaged version of XM. I encourage to join the list and contribute your thoughts to the project.

  • If you'd rather have a zipped file, it's available too.

  • XM uses Xalan and Xerces-J respectively as XSLT processor and XML parser. Xalan was originally developed by Lotus software from IBM, and Xerces was originally developer by IBM. IBM donated the code to the Apache Foundation.

  • If you need webspace to post the Web site you have built with XM, you can turn to the free Geocities. If you can afford to pay for hosting, Pair Network offers cheap and reliable hosting.

About the author

Benoit Marchal

Benoît Marchal is a consultant and writer based in Namur, Belgium. He has just released the second edition of XML by Example . He is also the author of Applied XML Solutions and XML and the Enterprise. Details on his latest projects are at marchal.com. You can contact Benoît at bmarchal@pineapplesoft.com.

Report abuse help

Report abuse

Thank you. This entry has been flagged for moderator attention.


Report abuse help

Report abuse

Report abuse submission failed. Please try again later.


developerWorks: Sign in


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

Choose your display name

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


Rate this article

Comments

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=XML
ArticleID=12046
ArticleTitle=Working XML: Wrapping up XM version 1
publish-date=10012001
author1-email=bmarchal@pineapplesoft.com
author1-email-cc=

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

For articles in technology zones (such as Java technology, Linux, Open source, XML), Popular tags shows the top tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), Popular tags shows the top tags for just that product zone.

For articles in technology zones (such as Java technology, Linux, Open source, XML), My tags shows your tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), My tags shows your tags for just that product zone.

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Try IBM PureSystems. No charge.

Special offers