Thinking XML: Basic XML and RDF techniques for knowledge management, Part 1

Generate RDF using XSLT

Columnist Uche Ogbuji begins his practical exploration of knowledge management with XML by illustrating techniques for populating Resource Description Framework (RDF) models with data from existing XML formats. As shown in the three code listings, RDF can be used as a companion to customized XML, not just as a canonical representation for certain types of data. This column, with code samples included, demonstrates how easy it can be to jump-start knowledge management with RDF even relatively late in the development game.

Uche Ogbuji (uche@ogbuji.net), CEO and principal consultant, Fourthought, Inc.

picture of Uche OgbujiUche Ogbuji (uche@ogbuji.net) is a consultant and co-founder of Fourthought Inc., a software vendor and consultancy specializing in XML solutions for enterprise knowledge management applications. Fourthought develops 4Suite, an open-source platform for XML middleware. Mr. Ogbuji is a computer engineer and writer born in Nigeria, living and working in Boulder, Colorado.



01 July 2001

Although Resource Description Framework (RDF) was designed by the W3C as a general metadata modeling facility, it offers many features that make it an ideal companion to XML data. In many emerging XML applications, the knowledge encapsulated in the application throughout its lifetime is stored in XML documents in a database or repository. The basis of RDF's strength as a knowledge-management tool is that it allows you to organize, interrelate, classify, and annotate this knowledge, thereby increasing the aggregate value of the stored data. RDF has a reputation for complexity that is belied by the simplicity of adding RDF support to XML-based applications. This article begins an exploration of the symbiosis between RDF and XML. I'll demonstrate how to use XSLT to generate RDF from XML.

Familiarity with RDF and XSLT is required. You may wish to read my introduction to RDF that appeared earlier in developerWorks, and other papers I link to in Resources.

Legacy isn't always legacy

As an example, let's take an issue tracker for the open development process of a technical specification. The specification is posted online and interested parties can read it, add issues related to the spec, comment on open issues, assign action items related to issues, and more.

XML is a great tool for putting together this issue tracker. Descriptions of issues and action items, and the related discussions, require flexible representation, but structure is important to maintaining the semantics of the data. In our example, the application has already been developed, and basic techniques are used for tasks such as sending action item reminders to users, supporting search and browsing, and so on. However, the developers have decided to start using RDF in the application in order to take advantage of the many existing tools and techniques that are available for RDF processing.

In choosing to use RDF, the developers don't want to redesign all the application data and logic. They would rather add on what they can and perhaps gradually migrate RDF processing closer to the core of the application. One of the tasks they face, therefore, is generating useful RDF from the XML data they have already accumulated.

An XML format example

Listing 1 is an example of the XML format for an individual issue in the tracker. It has a unique identifier and a reference element that indicates to what document or document portion the issue is relevant. The initial author of the issue is noted, which means that the user-management features of the system are in use. Anyone can contribute to the document, but registered users are specially noted and handled. The main description of the issue and the attached user comments appear inline, and there is a related action assigned to a user.


Sow data, reap metadata

There are several approaches to extracting RDF metadata from the XML files used in the issue tracker. Perhaps the most straightforward method is to write an XSLT transform that reads the file and emits an RDF/XML serialization of the metadata, as I'll demonstrate in the following section.

Because RDF is anchored on URIs (for better or for worse), you have to come up with some URI schemes for metadata nodes. Some things, such as the location of a spec against which an issue is authored, already have URIs. Other things may already have specialized XML representations; in our example, a user object is managed as a separate XML file. Still other things may be completely abstract, with no application machinery for them besides their metadata nodes. Examples of this last category are the RDF types that I suggest creating for resources. The URIs to use in the RDF are as follows:

  • The address of the specification under critique. Example: http://rdfinference.org/ril/ril-20010502.
  • The address of the XML source of an issue. Example: http://rdfinference.org/ril/issue-tracker/issues/i2001030423
  • The address of the XML source of a registered user's profile. Example: http://rdfinference.org/ril/issue-tracker/users/uogbuji.
  • RDF types for authors, issues, assignments, and so on. Example: http://rdfs.rdfinference.org/ril/issue-tracker#Author

One possible RDF serialization

Given the above, Listing 2 is a possible serialization of an RDF model that represents the metadata from Listing 1.

Note that in some cases I use anonymous resources, such as comment and action resources. This is a modeling choice. For instance, if you wanted to have a centralized index of actions for task scheduling, it would probably make sense to use a URI for the abstract actions, rather than leaving them anonymous.

The XSLT transform

Now that you know what the RDF looks like, you can construct a transform to convert issue descriptions into the appropriate RDF files. Listing 3 is such a transform in XSLT.

Some of the techniques you see in this listing I've already discussed in WSDL processing with XSLT, a previous developerWorks article that includes a section on converting Web Services Description Language (WSDL) to XML. In that case, the goal was to make the resulting RDF serialization look as close to the original WSDL XML as possible. There are no such constraints in this case, so the transform is less esoteric. The various XML elements are simply visited in turn, and equivalent RDF descriptions are built bit by bit.

The most important top-level operation is the match on the issue element, which merely turns around and calls a named template to do the actual building of RDF descriptions for the corresponding issue. The reason for this indirection is that it provides flexibility for customizing and extending this transform. For example, as you'll see in my next Thinking XML column, you could use the named template in a separate transform which performs a batch conversion of issue documents to RDF.

In the example, the issue tracker can track issues about a variety of resources besides the online specs themselves. (You could open up an issue on the issue tracker itself -- perhaps a bug report.) For this reason, the design also makes the handling of reference elements quite flexible. Using a separate XSLT mode (to ensure that reference elements are only resolved at the right point), a template checks the extensible content of the reference element. For now, the example application deals with the case that the reference is to a specification by simply creating an RDF description with a reference from the issue in question. As more types of extensible reference need to be processed, additional xsl:when clauses can be appended to handle those cases.

You can also see the basic machinery used to deal with the fact that user descriptions can be registered by profile or merely free-form text.


More of this to follow

In this column I have presented a simple example of the use of XSLT to extract RDF from XML instances. As more and more XML-based applications come into use, such techniques are useful in expanding applications with knowledge-management features.

The next installment will continue with the issue tracker example, demonstrating batch processing of the issue documents and some open-source tools useful for such processing.

The examples in this installment and in following installments are based on an actual project of putting together an issue tracker for the RDF Inference Language specification at rdfinference.org. Soon enough you will be able to see this practical work available on this public site. But until then, please feel free to experiment with the example code in this article, and send me any questions, comments, and ideas.


Download

DescriptionNameSize
Sample code for columnrdfcode1.zip6 KB

Resources

  • Dave Beckett's RDF Resource Guide is a comprehensive set of links to RDF-related articles, tools, and so on.
  • The examples in this article were tested using 4Suite's XSLT processor.
  • XML: the next big thing, an IBM research paper by Tom Halfhill, discusses the possibilities of RDF for powering next-generation search engines.
  • Check out Thinking XML's previous columns.

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into XML on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=XML
ArticleID=86590
ArticleTitle=Thinking XML: Basic XML and RDF techniques for knowledge management, Part 1
publish-date=07012001