Although Resource Description Framework (RDF) was designed by the W3C as a general metadata modeling facility, it offers many features that make it an ideal companion to XML data. In many emerging XML applications, the knowledge encapsulated in the application throughout its lifetime is stored in XML documents in a database or repository. The basis of RDF's strength as a knowledge-management tool is that it allows you to organize, interrelate, classify, and annotate this knowledge, thereby increasing the aggregate value of the stored data. RDF has a reputation for complexity that is belied by the simplicity of adding RDF support to XML-based applications. This article begins an exploration of the symbiosis between RDF and XML. I'll demonstrate how to use XSLT to generate RDF from XML.
Legacy isn't always legacy
As an example, let's take an issue tracker for the open development process of a technical specification. The specification is posted online and interested parties can read it, add issues related to the spec, comment on open issues, assign action items related to issues, and more.
XML is a great tool for putting together this issue tracker. Descriptions of issues and action items, and the related discussions, require flexible representation, but structure is important to maintaining the semantics of the data. In our example, the application has already been developed, and basic techniques are used for tasks such as sending action item reminders to users, supporting search and browsing, and so on. However, the developers have decided to start using RDF in the application in order to take advantage of the many existing tools and techniques that are available for RDF processing.
In choosing to use RDF, the developers don't want to redesign all the application data and logic. They would rather add on what they can and perhaps gradually migrate RDF processing closer to the core of the application. One of the tasks they face, therefore, is generating useful RDF from the XML data they have already accumulated.
An XML format example
Listing 1 is an example of the XML format for an individual issue in the tracker. It has a unique identifier and a reference element that indicates to what document or document portion the issue is relevant. The initial author of the issue is noted, which means that the user-management features of the system are in use. Anyone can contribute to the document, but registered users are specially noted and handled. The main description of the issue and the attached user comments appear inline, and there is a related action assigned to a user.
Sow data, reap metadata
There are several approaches to extracting RDF metadata from the XML files used in the issue tracker. Perhaps the most straightforward method is to write an XSLT transform that reads the file and emits an RDF/XML serialization of the metadata, as I'll demonstrate in the following section.
Because RDF is anchored on URIs (for better or for worse), you have to come up with some URI schemes for metadata nodes. Some things, such as the location of a spec against which an issue is authored, already have URIs. Other things may already have specialized XML representations; in our example, a user object is managed as a separate XML file. Still other things may be completely abstract, with no application machinery for them besides their metadata nodes. Examples of this last category are the RDF types that I suggest creating for resources. The URIs to use in the RDF are as follows:
- The address of the specification under critique. Example: http://rdfinference.org/ril/ril-20010502.
- The address of the XML source of an issue. Example: http://rdfinference.org/ril/issue-tracker/issues/i2001030423
- The address of the XML source of a registered user's profile. Example: http://rdfinference.org/ril/issue-tracker/users/uogbuji.
- RDF types for authors, issues, assignments, and so on. Example: http://rdfs.rdfinference.org/ril/issue-tracker#Author
One possible RDF serialization
Note that in some cases I use anonymous resources, such as comment and action resources. This is a modeling choice. For instance, if you wanted to have a centralized index of actions for task scheduling, it would probably make sense to use a URI for the abstract actions, rather than leaving them anonymous.
The XSLT transform
Now that you know what the RDF looks like, you can construct a transform to convert issue descriptions into the appropriate RDF files. Listing 3 is such a transform in XSLT.
Some of the techniques you see in this listing I've already discussed in WSDL processing with XSLT, a previous developerWorks article that includes a section on converting Web Services Description Language (WSDL) to XML. In that case, the goal was to make the resulting RDF serialization look as close to the original WSDL XML as possible. There are no such constraints in this case, so the transform is less esoteric. The various XML elements are simply visited in turn, and equivalent RDF descriptions are built bit by bit.
The most important top-level operation is the match on the
issue element, which merely turns around and calls a named template to do the actual building of RDF descriptions for the corresponding issue. The reason for this indirection is that it provides flexibility for customizing and extending this transform. For example, as you'll see in my next Thinking XML column, you could use the named template in a separate transform which performs a batch conversion of issue documents to RDF.
In the example, the issue tracker can track issues about a variety of resources besides the online specs themselves. (You could open up an issue on the issue tracker itself -- perhaps a bug report.) For this reason, the design also makes the handling of reference elements quite flexible. Using a separate XSLT mode (to ensure that reference elements are only resolved at the right point), a template checks the extensible content of the
reference element. For now, the example application deals with the case that the reference is to a specification by simply creating an RDF description with a reference from the issue in question. As more types of extensible reference need to be processed, additional
xsl:when clauses can be appended to handle those cases.
You can also see the basic machinery used to deal with the fact that user descriptions can be registered by profile or merely free-form text.
More of this to follow
In this column I have presented a simple example of the use of XSLT to extract RDF from XML instances. As more and more XML-based applications come into use, such techniques are useful in expanding applications with knowledge-management features.
The next installment will continue with the issue tracker example, demonstrating batch processing of the issue documents and some open-source tools useful for such processing.
The examples in this installment and in following installments are based on an actual project of putting together an issue tracker for the RDF Inference Language specification at rdfinference.org. Soon enough you will be able to see this practical work available on this public site. But until then, please feel free to experiment with the example code in this article, and send me any questions, comments, and ideas.
|Sample code for column||rdfcode1.zip||6 KB|
- Dave Beckett's RDF Resource Guide is a comprehensive set of links to RDF-related articles, tools, and so on.
- The examples in this article were tested using 4Suite's XSLT processor.
- XML: the next big thing, an IBM research paper by Tom Halfhill, discusses the possibilities of RDF for powering next-generation search engines.
- Check out Thinking XML's previous columns.