Thinking XML: XML Topic Maps by the book

A first look at the other Semantic Web format

Topic Maps provide a system for organizing information, and XML Topic Maps bring this system to the world of XML. In this article, Uche Ogbuji examines XML Topic Maps, introducing the technology in the course of reviewing a key book on the topic.

Share:

Uche Ogbuji, Principal Consultant, Fourthought, Inc.

Photo of Uche OgbujiUche Ogbuji is a consultant and co-founder of Fourthought Inc., a software vendor and consultancy specializing in XML solutions for enterprise knowledge management. Fourthought develops 4Suite, an open-source platform for XML, RDF, and knowledge-management applications. Mr. Ogbuji is a computer engineer and writer born in Nigeria, living and working in Boulder, Colorado, USA. You can contact Mr. Ogbuji at uche@ogbuji.net.



30 July 2003

Semantic Web technologies work to formalize descriptions and classifications of concepts. They attempt to reduce the conceptual mismatches that make it so difficult to connect information systems. As you can imagine, many possible approaches and techniques might apply to such a subjective and ambitious undertaking. One important set of techniques is that of Topic Maps.

I have been remiss in not covering Topic Maps in this column to this point. The main problem has been my very incomplete understanding of them. Recently, I received the book XML Topic Maps: Creating and Using Topic Maps for the Web, edited by Jack Park and Sam Hunting. The book comprises chapters written by a host of experts from the Topic Maps world. It was clear to me that this book represented the best chance for me to improve my understanding of Topic Maps, and I decided to combine a review of the book with an introduction to the subject.

A very good place to start

Jack Park starts the book off with a very enthusiastic, but rather divagating and confusing introduction. I'd hoped the introduction would be an occasion to lay out the basic concepts of Topic Maps in clear language for the uninitiated, and to neatly chart out the rest of the book for the reader in terms of those basic concepts. In Chapter 2, Michel Biezunski does provide a good overview of the Topic Maps paradigm, although it assumes background understanding of the general problems that motivate Topic Maps. Readers of this column should have that background, and will find this chapter a nice primer on the Zen of Topic Maps.

Topic Maps were originally developed by a group in the SGML community to formalize the building of indices and lexicons. The result of the original effort was ISO/IEC 13250, a standard that defines the complete model of Topic Maps. This standard predates XML, but the rapid development of the XML and the Web motivated XML Topic Maps (XTM), which is based on the ISO/IEC 13250 model but defines an XML syntax and limits itself to addressing through URIs. In fact, XTM is defined as an XLink application, where the linking is given semantics peculiar to the Topic Maps model.

Topics are the basic building blocks in Topic Maps -- a topic is a computer representation of a concept. Concepts that are repesented by topics are formalized as subjects. The formalization of the distinction between the abstract subject and its representation as a topic is a bedrock consideration of Topic Maps. Topics are related together by associations. A topic may also have a set of locations from which it may be accessed in some particular form. These locations are called topic occurrences. A topic may have a name, or it may have no name or more than one. Topics Maps also show lexicographical roots by building into the core model two forms of variants on the base name: display name and sort key.

When I discuss XML with people, I am discussing an abstraction which can easily be a Topic Maps subject. If I wanted to formalize the concept I could write a map which has a topic representing XML. One occurrence of this topic might be the HTML spec available on the W3C's site. Another may be the download location for the PDF form of the spec. Additional associations from XML to related topics could represent the subjects SGML, XSLT, HTML, or Unicode. The base name of the topic might be "XML", but one might choose a display name "XML (Extensible Markup Language)" so that people using tools to browse the Topic Map wouldn't have to decipher the acronym themselves.

Another fundamental concept in Topic Maps is that of scope. A scope is a special topic that defines a grouping or boundary for related topics. Writers on IBM developerWorks could create a scope encompassing topics that represent concepts they cover in their articles. A scope acts as a namespace: Base names are expected to be unique in a scope, and two topics with the same base name and also in the same scope can be merged. For example, I might have created a topic map representing "XML", and another developerWorks author might have done the same independently, not aware of my own efforts. Since we all maintain a unified scope, the two topics with base name "XML" can be merged. Topic Maps define quite detailed rules for merging, and what happens to occurrences, associations, and the like.

Chapter 3 of the book, by Steven Newcomb, stands quite alone as an essay on the history, motivations and culture of Topic Maps. In many ways it repeats the introduction, but in a rather more coherent fashion. Sam Hunting then takes over in Chapter 4 with a cut and dried survey of Topic Maps standardization efforts from ISO to the grass roots. The book continues in this fashion, weaving in and out of intermediate and advanced topics with some abruptness. For readers looking to get a good, practical introduction to Topic Maps, I recommend reading in the following sequence:

  • Chapter 2, "Introduction to the Topic Maps Paradigm" (Michel Biezunski)
  • Chapter 6, "How to Start Topic Mapping Right Away with the XTM Specification" (Sam Hunting)
  • Chapter 12, "Topic Maps and RDF" (Eric Freese) -- since readers of this column will likely be familiar with RDF
  • Chapter 10, "Open Source Topic Map Software" (Eric Freese, Kal Ahmed, Jack Park, Sam Hunting) -- if you are inclined towards Java technology, otherwise Chapter 9, "Creating and Maintaining Enterprise Web Sites with Topic Maps and XSLT" (Nikita Ogievetsky)
  • Chapter 8, "Topic Maps in the Life Sciences" (John Park and Nefer Park) -- for an overall example case
  • Chapter 5, "Topic Maps from Representation to Identity: Conversation, Names, and Published Subject Indicators" (Bernard Vatant)
  • Chapter 3, "A Perspective on the Quest for Global Knowledge Interchange" (Steven Newcomb)

As you read the book, and through further adventures, you will undoubtedly want to keep your thumb on the glossary, which comes right after the last chapter (this glossary is a very strong complement to another very handy glossary -- the one in section 1.3 of the XTM specification). The above chapters should get you well enough acquainted with Topic Maps -- and give you enough flavor of them in practice -- that you can venture into the other chapters, which I consider more advanced.


Topic maps in tags

Most of the practical work in Topic Maps builds on the universe of XML tools, including XSLT and Java APIs. The XML syntax is very clear, as you can see from Listing 1, a snippet based on examples in the XTM specification:

Listing 1: Snippets of a Topic Map of Shakespeare and his work
<!-- A topic representing the Elizabethan playwright
     William Shakespeare.  No occurrences because you cannot download
     a person -->

<topic id="shakespeare">
  <baseName>
    <baseNameString>William Shakespeare</baseNameString>
  </baseName>
</topic>

<!-- A topic representing the play "Hamlet" -->

<topic id="hamlet">
  <instanceOf><topicRef xlink:href="#play"/></instanceOf>
  <baseName>
    <baseNameString>Hamlet, Prince of Denmark</baseNameString>
  </baseName>

<!-- An occurrence given by Project Gutenberg's plain text download
     of the Hamlet -->

  <occurrence>
    <instanceOf>
      <topicRef xlink:href="#plain-text-format"/>
    </instanceOf>
    <resourceRef
xlink:href="ftp://www.gutenberg.org/pub/gutenberg/etext97/1ws2610.txt"/>
  </occurrence>
</topic>

<!-- An association representing an authorship relationship -->

<topic id="written-by">
  <baseName>
    <baseNameString>written by</baseNameString>
  </baseName>
</topic>

<!-- Used here to associate Shakespeare and the play Hamlet -->

<association>
  <instanceOf><topicRef xlink:href="#written-by"/></instanceOf>
  <member>
    <roleSpec><topicRef xlink:href="#author"/></roleSpec>
    <topicRef xlink:href="#shakespeare"/>
  </member>
  <member>
    <roleSpec><topicRef xlink:href="#work"/></roleSpec>
    <topicRef xlink:href="#hamlet"/>
  </member>
</association>

This document is accessible to a plain XLink processor as well as specialized Topic Maps tools. It suffers from the usual XML verbosity, but it is very cleanly structured.


Wrap up

XML Topic Maps includes a lot of good material, but unfortunately the organization is rather wanting. The book reads like a combination of an introductory text and proceedings from a Topic Maps conference. I have provided my impression of a good reading sequence for beginners. I think it would have been very helpful for the book to have been divided into two sections, the first one collating the introductory chapters. Also some chapters have to do with theory and design, some with presenting the Topic Maps culture and community, and some with programming techniques. Some progression of these areas would help people find topics of most immediate interest.

Topic Maps are very interesting technology. They bring a high degree of rigor to Semantic Web efforts. This rigor does come at some cost, though, as the specifications define a dizzying variety of terms and nuances such that the model can be very hard to conceptualize. A lot of recent discussion has been on ways to bridge Topic Maps and other related technologies such as RDF. This is very important work because perhaps RDF can gain from the rigor of Topic Maps, and Topic Maps can gain from the simplicity and straightforwardness of RDF. At least you can be certain that as such initiatives progress, I shall keep you abreast of the latest in this column.

Resources

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into XML on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=XML
ArticleID=12303
ArticleTitle=Thinking XML: XML Topic Maps by the book
publish-date=07302003