Formalizing DITA topics as subjects
In a topic-oriented architecture such as DITA, content is authored in small, independent units that are assembled to provide help systems, books, courses, and other deliverables. Each unit of information answers a single question for a specific purpose. That is, each topic has specific, independent subject matter -- the very reason that these units of information are called topics. For instance, one topic might describe the format of a user definition file on a Web site while another topic explains the principles of Web site security and a third topic lays out the procedure for setting up Web site logins.
Because each topic has a specific meaning, DITA topics are tailor-made for semantic processing. However, current semantic processors can't read the text of a topic to find out what it means. What's missing is a formal declaration of the topic's subject matter that a semantic processor can understand -- like the address on an envelope that allows mail sorters to route the contents to the appropriate destination.
Simple Knowledge Organization System (SKOS) provides a standard for indicating the subject matter of content. SKOS lets you define the subjects for a particular subject matter area (organizing these subjects as a taxonomy if desired) and then classify each piece of content to indicate its subject. For instance, using SKOS, you could define configuration and security as subjects, and classify the three example topics that relate to those subjects so that users could browse the subjects to find the content regardless of whether the words "configuration" or "security" actually appear in the text.
SKOS is expressed with Resource Description Framework (RDF), the fundamental language of the Semantic Web. However, SKOS provides a higher-level language that's designed for readable content. SKOS has benefited from broad perspectives, including those of experts in OWL/RDF, TopicMaps, ontology, and library science. In the spectrum of standards, SKOS contributes by bridging the gap between traditional indexing and formal ontologies for the Semantic Web.
Thus, DITA has a natural fit with SKOS in solutions where DITA topics are classified with subjects that are expressed in SKOS for runtime processing.
Note: SKOS uses the concept label for these formal subjects. DITA, however, uses the concept label for a unit of content that is conceptual in nature. To avoid confusion, this article (and the accompanying specialization) uses the subject label for the same thing as the SKOS concept, and uses the concept label in the DITA sense.
The typical approach for taxonomies is to maintain the formal subject definition separate from the content. However, by using a DITA specialization you can maintain the subject definition as authored content. Content creators realize several benefits by managing subject definitions as content:
- Formal subjects are often defined by glossary topics or other topics
that already exist within the published information set. The TopicMaps
community
has long recognized such authoritative definitional resources under the name
published subject indicators. For instance, the documentation for
an application server product is likely to define important subjects within
the subject area such as authentication, Web server, and so
on.
Even if you don't include the subject definitions in your published information, you can use your standard content tools for your subject definitions. For instance, you can author the subject definitions with your XML editor, and archive and version the subject definitions along with your content in your content management or version control system. You can also use existing formatting processes to produce catalogs of subject definitions for use by authors -- that is, you don't have to implement a separate authoring and processing system for subject definitions.
- Subject classification is as much a part of the information architecture
of your content as the navigational organization. So rather than
trying to bolt on semantic precision after the fact,
information architects can encourage better content by providing
a formal definition of the subjects to be covered
and thus guide the creation of content.
Where an existing information architecture is especially crisp, the information architect may find that some subject classifications merely formalize the existing organization of or relationships between topics.
- RDF is optimized for processing rather than authoring. In particular,
RDF expresses information as a set of records rather than taking advantage
of XML for more understandable tree and table structures. RDF enthusiasts sometimes suggest that sophisticated tools make it unnecessary to see the format of source files; however, HTML has demonstrated the value of an understandable file format even after such tools exist.
In particular, a taxonomy of subjects has a natural tree structure that can have a straightforward representation as a specialized DITA map. Variant serializations of RDF models are legion, and a DITA map that's specialized for subject definition and classification could be regarded as an alternative serialization of a SKOS model.
Parts of a subject classification
As specified by SKOS and implemented in the DITA specialization, a subject classification has the following parts:
Table 1. Parts of a subject classification
| Part | Identifies | DITA implementation |
|---|---|---|
| Subject definition | The meaning of a formal subject | A DITA topic that defines a default label for the subject and explains what the subject covers. |
| Subject scheme | The relationships between subjects | A DITA map that organizes the subjects into hierarchies. For instance, the Task subject might contain both the Installing subject and the Configuring subject. The map can also express associative relationships for subjects that are related in other ways. |
| Content classification | The subject matter of resources | Another DITA map that expresses relationships between topics that define formal subjects and content topics that treat some aspect of the subject. The same map can define the navigation relationships and related links for the content topics in standard DITA practice. |
In addition, DITA provides the content topics that are classified and other information about those topics. For instance, a DITA map can organize the content topics to provide a navigation hierarchy or define related links for those topics.
Figure 1 illustrates the parts of the subject classification:
Figure 1. Parts of a subject classification

Figure 1 shows:
- The formal subjects as teal circles
- The content topics as blue circles
- The subject relationships as yellow arrows
- The classification relationships as blue arrows
- The topic navigation relationships such as navigation hierarchies and related links as magenta arrows
From a publishing perspective, subject classification resembles both indexing and glossaries. This concept isn't new -- the recognition that glossaries and indexing make semantic assertions contributed to the development of TopicMaps. Where an index term can have multiple meanings, each subject has the same precision as a single meaning from a glossary. It is this precision that makes it possible to manage content based on its meaning.
Defining subjects
To define a subject, you create a DITA topic (typically a concept topic) to identify an aspect of the subject matter of your content (see Figure 2).
Figure 2. Defined subjects

The DITA topic specifies the subject with a specialized section element that includes the following kinds of information:
- Default labels, including synonyms and denotative images
- Notes on the definition and on the scope of coverage for the subject
Listing 1 shows an example of the definition for the Configuring subject:
Listing 1. Definition for the Configuring subject
<concept id="configuring">
<title>Configuring</title>
<shortdesc>You configure components to set up or refine your solution.</shortdesc>
<conbody>
<p>You don't have to get the best configuration the first time....</p>
<subjectDetail>
<subjectLabels>
<altLabel>Setting up</altLabel>
</subjectLabels>
<scopeNote>Administrative tasks performed after installation...</scopeNote>
</subjectDetail>
</conbody>
</concept>This specialized section can be used in any topic type that allows section elements. In particular, to formalize an existing glossary, concept, or reference topic that authoritatively defines a subject, you can add the specialized section. Because the meaning of a formal topic should never vary based on its use, these fields should be part of the topic.
Organizing subjects
As part of a scheme, specialized DITA map elements specify the relationships that define a thesaurus or taxonomy hierarchy (see Figure 3).
Figure 3. Hierarchical and associative relationships between subjects

The hierarchy can include subject heads, whose meaning is equivalent to the union of subjects contained within the subject head. Specialized topic groups and relationship tables specify related-to relationships that cut across the subject hierarchy.
Listing 2 is an example of a subject scheme that identifies Installing and Configuring as Task subjects and Resource utilization and Security as Concerns:
Listing 2. Example of a subject scheme
<subjectScheme title="Sampletaxonomy" id="taxonomyScheme">
<subjectdef href="task.dita" navtitle="Task">
<subjectdef href="installing.dita" navtitle="Installing"/>
<subjectdef href="configuring.dita" navtitle="Configuring"/>
...
<subjectdef href="concern.dita" navtitle="Concern">
<subjectdef href="utilization.dita" navtitle="Resource utilization"/>
<subjectdef href="security.dita" navtitle="Security"/>
...You can have multiple schemes for the same subjects. For instance, different audiences might be interested in a different subset of the taxonomy.
This approach of imposing alternative organizational structures on subjects fits well with the standard use of DITA maps for separating context from content, allowing different organizations to be imposed on the same content. That is, the scheme can be considered a special kind of context for subject definition topics.
Schemes can use non-DITA subject definitions (such as publicly-defined SKOS, OWL, or TopicMaps
subjects). You cite the public identifier of the subject with the subjectdef
element and identify the subject definition format with the format attribute.
This allows you to incorporate publicly-defined subjects into your schemes,
or to integrate a formal ontology maintained by your organization with concepts
that are specific to your content.
Finally, if you're not organizing subjects into a hierarchical taxonomy or otherwise expressing relationships between subjects, you don't have to create a scheme. You can still classify content topics without a scheme. For instance, you might take this approach to support a controlled index or a controlled tagsonomy in which topics are classified with tags, but each tag must be defined by creating a subject topic with a precise definition.
Classifying content
To classify content, another map specialization associates formal subjects with topics (see Figure 4).
Figure 4. Content topics classified by subjects

Inside the topicref element that references and contains references to the classified
content, you nest a topicsubject element to specify the subjects of the content.
You can identify a primary subject with the href attribute of the topicsubject
element, which also contains subjectref elements for the secondary subjects.
If no subject is primary, the topicsubject element should be a container without
the href attribute.
Listing 3 is an example of a content classification:
Listing 3. Content classification
<topicref href="websecure.dita">
<topicsubject>
<subjectref href="webserver.dita"/>
<subjectref href="security.dita"/>
</topicsubject>
<topicref href="https_protocol.dita"/>
... other subordinate content topics ...
</topicref>
<topicref href="loginsetup.dita" collection-type="sequence">
<topicsubject>
<subjectref href="configuring.dita"/>
<subjectref href="webserver.dita"/>
<subjectref href="security.dita"/>
</topicsubject>
<topicref href="editinguserdef.dita"/>
... other subordinate content topics ...
</topicref>In the same way that subject
schemes can cite public non-DITA subjects, you can classify DITA content with
SKOS, OWL, or TopicMaps subjects by citing the public URI identifiers with
the subjectref element and setting the format attribute. You can also classify
non-DITA content such as HTML and PDF files by referring to the files with
the topicref element, but identifying the content format with the format attribute.
Because subjects are defined by special topics, you can include the subject definition in the content and use it for classification. For instance, the subject topic for Security can both classify content about security and describe security within the Web site or help system content. Figure 5 illustrates this scenario:
Figure 5. A subject topic that is also a classified content topic

The central circle represents a conceptual topic (such as Security), which:
- Has a broader relationship to a subject (perhaps System Concerns) within a scheme
- Is classified by two other subjects (perhaps the Background type and the Novice User role)
- Contributes to the classification of one topic (such as Web Security)
- Occupies the second position in a navigation sequence (perhaps under a Glossary heading)
Associating and processing the pieces
Because the classification map is distinct from the scheme map, you can apply multiple schemes to the same classification without requiring changes to the classification. To combine the scheme and classification maps for a deliverable, a higher-level map can refer to both maps using a DITA map reference (see Figure 6).
Figure 6. Assembling a deliverable from the scheme and content maps

You might process a single map to generate both an HTML representation of the content and a SKOS representation of the subjects and classification.
Generating SKOS for processing
After you define the formal subjects, organize
them into hierarchical schemes, and classify your content, you can run a transform
to convert the DITA subject classification into SKOS for processing by SKOS
tools. To experiment, install the DITA Open Toolkit (see Resources) and download the demonstration
that accompanies this article, x-dita10_thesaurus.zip.
This demonstration updates the demo directory
with a thesaurus subdirectory, which has a readme.html file with links to the reference documentation and instructions for building the sample content.
In a typical approach, SKOS tools display the thesaurus or taxonomy to the user for browsing. By picking one or more subjects, the user navigates to the relevant classified content. For instance, a user might pick the Security and Web Server subjects to find the example topics. You can see an example of one such interface (many are possible) at the Semantic Web Environmental Directory (SWED) site (see Resources). This framework uses SKOS for its subject classification. More sophisticated uses of the subject classification are also possible.
Summary
This article has only addressed the basics of classifying your content. The DITA specialization provides additional capabilities, including associative relationships between subjects and tabular classification. As noted above, however, you need separate runtime tools to operate on the classification. In addition, the art of defining subjects and organizing them into taxonomies has its own strategies and best practices.
Also, users of this DITA specialization should bear in mind that SKOS itself hasn't yet progressed to an approved standard. As a result, the SKOS-compatible DITA specialization could undergo some evolution.
However, the DITA specialization already demonstrates the value of maintaining a taxonomy and classification as part of your content. By taking this approach, content creators can use the technologies of the Semantic Web to process their content based on what it means.
Download
| Description | Name | Size |
|---|---|---|
| Download for thesaurus specialization | x-dita10_thesaurus.zip | 259 KB |
Resources
Learn
- Darwin Information Typing Architecture (DITA XML): Find out more about DITA at the OASIS Cover Pages.
- "Design patterns for information architecture with DITA map domains" (developerWorks, September 2005): Learn how you can specialize a DITA map as part of an information architecture.
- "Introducing SKOS": Get a quick overview of the value of SKOS.
- Simple Knowledge Organisation System (SKOS): Explore SKOS in depth at the W3C site.
- "The future of the Web is Semantic" (developerWorks, October 2005): Find out more about the spectrum of RDF standards that lay the foundations for the Semantic Web.
- "The TAO of Topic Maps": Get the TopicMaps insights about associations between the subjects of content, including the implications of indexes and glossaries.
- "Building a Metadata-Based Website": Find out how a taxonomy can support discovery and retrieval of document content.
- "Developing and Creatively Leveraging Hierarchical Metadata and Taxonomy": Learn about considerations for integrating a taxonomy into an Information Architecture.
- Semantic Web Environmental Directory (SWED): Interact with an example of a subject classification.
Get products and technologies
- DITA Open Toolkit: Download the Open Source toolkit for processing DITA.
Discuss
- dita-users: Discuss DITA with other adopters.
Comments
Dig deeper into XML on developerWorks
- Overview
- New to XML
- Technical library (tutorials and more)
- Forums
- Downloads and products
- Open source projects
- Standards
- Events
developerWorks Premium
Exclusive tools to build your next great app. Learn more.
developerWorks Labs
Technical resources for innovators and early adopters to experiment with.
IBM evaluation software
Evaluate IBM software and solutions, and transform challenges into opportunities.
Robert D. Anderson is the chief architect for the DITA Open Toolkit, and is also a member of the OASIS DITA Technical Committee. He has worked on IBM's internal Information Development Workbench since 1999, supporting both XML (DITA) and SGML (IBMIDDoc).
Colin Bird is an Information Architect in the User Technologies Department at IBM Hursley. Prior to that, Colin worked at the IBM UK Scientific Centre, developing image manipulation and visualization applications. This work led to several information retrieval projects, particularly involving content-based image retrieval, culminating in a year's secondment to the Intelligence, Agents, Multimedia Group at Southampton University. Colin currently holds a Visiting Senior Research Fellowship there. His time at Southampton engendered a strong interest in the capture of metadata for subsequent use in both the principled retrieval and the adaptive delivery of information.