Semantic Web technologies work to formalize descriptions and classifications of concepts. They attempt to reduce the conceptual mismatches that make it so difficult to connect information systems. As you can imagine, many possible approaches and techniques might apply to such a subjective and ambitious undertaking. One important set of techniques is that of Topic Maps.
I have been remiss in not covering Topic Maps in this column to this point. The main problem has been my very incomplete understanding of them. Recently, I received the book XML Topic Maps: Creating and Using Topic Maps for the Web, edited by Jack Park and Sam Hunting. The book comprises chapters written by a host of experts from the Topic Maps world. It was clear to me that this book represented the best chance for me to improve my understanding of Topic Maps, and I decided to combine a review of the book with an introduction to the subject.
A very good place to start
Jack Park starts the book off with a very enthusiastic, but rather divagating and confusing introduction. I'd hoped the introduction would be an occasion to lay out the basic concepts of Topic Maps in clear language for the uninitiated, and to neatly chart out the rest of the book for the reader in terms of those basic concepts. In Chapter 2, Michel Biezunski does provide a good overview of the Topic Maps paradigm, although it assumes background understanding of the general problems that motivate Topic Maps. Readers of this column should have that background, and will find this chapter a nice primer on the Zen of Topic Maps.
Topic Maps were originally developed by a group in the SGML community to formalize the building of indices and lexicons. The result of the original effort was ISO/IEC 13250, a standard that defines the complete model of Topic Maps. This standard predates XML, but the rapid development of the XML and the Web motivated XML Topic Maps (XTM), which is based on the ISO/IEC 13250 model but defines an XML syntax and limits itself to addressing through URIs. In fact, XTM is defined as an XLink application, where the linking is given semantics peculiar to the Topic Maps model.
Topics are the basic building blocks in Topic Maps -- a topic is a computer representation of a concept. Concepts that are repesented by topics are formalized as subjects. The formalization of the distinction between the abstract subject and its representation as a topic is a bedrock consideration of Topic Maps. Topics are related together by associations. A topic may also have a set of locations from which it may be accessed in some particular form. These locations are called topic occurrences. A topic may have a name, or it may have no name or more than one. Topics Maps also show lexicographical roots by building into the core model two forms of variants on the base name: display name and sort key.
When I discuss XML with people, I am discussing an abstraction which can easily be a Topic Maps subject. If I wanted to formalize the concept I could write a map which has a topic representing XML. One occurrence of this topic might be the HTML spec available on the W3C's site. Another may be the download location for the PDF form of the spec. Additional associations from XML to related topics could represent the subjects SGML, XSLT, HTML, or Unicode. The base name of the topic might be "XML", but one might choose a display name "XML (Extensible Markup Language)" so that people using tools to browse the Topic Map wouldn't have to decipher the acronym themselves.
Another fundamental concept in Topic Maps is that of scope. A scope is a special topic that defines a grouping or boundary for related topics. Writers on IBM developerWorks could create a scope encompassing topics that represent concepts they cover in their articles. A scope acts as a namespace: Base names are expected to be unique in a scope, and two topics with the same base name and also in the same scope can be merged. For example, I might have created a topic map representing "XML", and another developerWorks author might have done the same independently, not aware of my own efforts. Since we all maintain a unified scope, the two topics with base name "XML" can be merged. Topic Maps define quite detailed rules for merging, and what happens to occurrences, associations, and the like.
Chapter 3 of the book, by Steven Newcomb, stands quite alone as an essay on the history, motivations and culture of Topic Maps. In many ways it repeats the introduction, but in a rather more coherent fashion. Sam Hunting then takes over in Chapter 4 with a cut and dried survey of Topic Maps standardization efforts from ISO to the grass roots. The book continues in this fashion, weaving in and out of intermediate and advanced topics with some abruptness. For readers looking to get a good, practical introduction to Topic Maps, I recommend reading in the following sequence:
- Chapter 2, "Introduction to the Topic Maps Paradigm" (Michel Biezunski)
- Chapter 6, "How to Start Topic Mapping Right Away with the XTM Specification" (Sam Hunting)
- Chapter 12, "Topic Maps and RDF" (Eric Freese) -- since readers of this column will likely be familiar with RDF
- Chapter 10, "Open Source Topic Map Software" (Eric Freese, Kal Ahmed, Jack Park, Sam Hunting) -- if you are inclined towards Java technology, otherwise Chapter 9, "Creating and Maintaining Enterprise Web Sites with Topic Maps and XSLT" (Nikita Ogievetsky)
- Chapter 8, "Topic Maps in the Life Sciences" (John Park and Nefer Park) -- for an overall example case
- Chapter 5, "Topic Maps from Representation to Identity: Conversation, Names, and Published Subject Indicators" (Bernard Vatant)
- Chapter 3, "A Perspective on the Quest for Global Knowledge Interchange" (Steven Newcomb)
As you read the book, and through further adventures, you will undoubtedly want to keep your thumb on the glossary, which comes right after the last chapter (this glossary is a very strong complement to another very handy glossary -- the one in section 1.3 of the XTM specification). The above chapters should get you well enough acquainted with Topic Maps -- and give you enough flavor of them in practice -- that you can venture into the other chapters, which I consider more advanced.
Topic maps in tags
Most of the practical work in Topic Maps builds on the universe of XML tools, including XSLT and Java APIs. The XML syntax is very clear, as you can see from Listing 1, a snippet based on examples in the XTM specification:
Listing 1: Snippets of a Topic Map of Shakespeare and his work
<!-- A topic representing the Elizabethan playwright William Shakespeare. No occurrences because you cannot download a person --> <topic id="shakespeare"> <baseName> <baseNameString>William Shakespeare</baseNameString> </baseName> </topic> <!-- A topic representing the play "Hamlet" --> <topic id="hamlet"> <instanceOf><topicRef xlink:href="#play"/></instanceOf> <baseName> <baseNameString>Hamlet, Prince of Denmark</baseNameString> </baseName> <!-- An occurrence given by Project Gutenberg's plain text download of the Hamlet --> <occurrence> <instanceOf> <topicRef xlink:href="#plain-text-format"/> </instanceOf> <resourceRef xlink:href="ftp://www.gutenberg.org/pub/gutenberg/etext97/1ws2610.txt"/> </occurrence> </topic> <!-- An association representing an authorship relationship --> <topic id="written-by"> <baseName> <baseNameString>written by</baseNameString> </baseName> </topic> <!-- Used here to associate Shakespeare and the play Hamlet --> <association> <instanceOf><topicRef xlink:href="#written-by"/></instanceOf> <member> <roleSpec><topicRef xlink:href="#author"/></roleSpec> <topicRef xlink:href="#shakespeare"/> </member> <member> <roleSpec><topicRef xlink:href="#work"/></roleSpec> <topicRef xlink:href="#hamlet"/> </member> </association>
This document is accessible to a plain XLink processor as well as specialized Topic Maps tools. It suffers from the usual XML verbosity, but it is very cleanly structured.
XML Topic Maps includes a lot of good material, but unfortunately the organization is rather wanting. The book reads like a combination of an introductory text and proceedings from a Topic Maps conference. I have provided my impression of a good reading sequence for beginners. I think it would have been very helpful for the book to have been divided into two sections, the first one collating the introductory chapters. Also some chapters have to do with theory and design, some with presenting the Topic Maps culture and community, and some with programming techniques. Some progression of these areas would help people find topics of most immediate interest.
Topic Maps are very interesting technology. They bring a high degree of rigor to Semantic Web efforts. This rigor does come at some cost, though, as the specifications define a dizzying variety of terms and nuances such that the model can be very hard to conceptualize. A lot of recent discussion has been on ways to bridge Topic Maps and other related technologies such as RDF. This is very important work because perhaps RDF can gain from the rigor of Topic Maps, and Topic Maps can gain from the simplicity and straightforwardness of RDF. At least you can be certain that as such initiatives progress, I shall keep you abreast of the latest in this column.
- Participate in the discussion forum.
- Read the subject book, XML Topic Maps: Creating and Using Topic Maps for the Web, edited by Jack Park and Sam Hunting (Addison-Wesley, 2002) and featuring contributions by just about all the leading lights in the Topic Maps community.
- Visit TopicMaps.org for a large collection of introductions and other resources related to XTM, including the official specification.
- For discussion of the more general Topic Maps model, see Topicmaps.net. Vendors in the Topic Maps space have also created a community site, topicmap.com.
- Read Lars Marius Garshol's great introduction to the topic in his article "What Are Topic Maps?," and Steve Pepper's solid follow-up in "The TAO of Topic Maps."
- Learn more about XLink: Fabio Arciniegas' What is XLink? is out of date but still provides one of the best gentle introductions to the technology. While you're at it, take a look at Kevin Williams' XML for Data column on XLink here on developerWorks (July 2001).
- Find more XML resources on the developerWorks XML zone, including previous installments of the Thinking XML column.
- Try IBM's DB2 database for relational database storage, and XML-related tools such as the DB2 XML Extender which provides a bridge between XML and relational systems. Visit the DB2 Developer Domain to learn more about DB2.
- Find out how you can become an IBM Certified Developer in XML and related technologies.