I'll begin with one caveat: The Dublin Core Metadata Initiative does not really have anything to do with XML. The most widespread use of DCMI is, indeed, probably within namespace-enhanced XML documents; but nothing about metadata generally -- or this collection of elements specifically -- requires that the underlying data be encoded as XML. Instead, DCMI is a generic framework for describing a broadly useful collection of information about documents of all sorts. The individual documents that are characterized using DCMI might be encoded in XML or in most any other electronic or physical format, and their subject matter can be pretty much any endeavor of human creation.
DCMI is a vocabulary for talking about documents, with (relatively) well-defined semantics for the meaning and usage of its terms. The terms included in DCMI are divided into a minimal set of base elements and an optional collection of refinements to these base elements.
Much of the benefit of DCMI comes simply from standardizing the way metadata terms are spelled, and the format of the values these terms will take. For example, you might identify a work by near-synonyms like "author", "artist", "originator", "maker" or "creator"; DCMI standardizes the name of this role with the last term, "creator", in order to provide a consistent method of comparing documents that may share authorship. Naturally, the names of persons and organizations who might be creators can be pretty much anything; in comparing creators to each other, an application of DCMI might wish to further standardize the format of names (for instance, "Lastname, Firstname") beyond what the DCMI recommendation provides.
In addition to standardizing metadata terms, DCMI provides recommendations for choosing values, either by enumeration or specification of patterns. For example, the term "date" is a rather obvious choice of metadata term, but dates come in multiple formats. DCMI recommends that dates be given in the ISO 8601 subset specified in the W3C Date and Time Formats note (see Resources). In other cases, such as "coverage" -- which is defined as "the extent or scope of the content of the resource" -- the DCMI recommends using names from the (large, but finite) enumeration in the Thesaurus of Geographic Names (see Resources).
For an example of the concrete use of DCMI metadata, look at the document "DCMI Metadata Terms" (see Resources), a presumably well-thought-out instantiation of the DCMI's own principles. Incidentally, notice that DCMI vocabulary terms are not case-sensitive, since they will often be used in case-insensitive contexts such as HTML (pre-XHTML, that is).
The "DCMI Metadata Terms" document encodes its metadata in several distinct ways, at least in the HTML version. This redundancy is useful in that it shows off each of the three most important encoding styles you are likely to find in the use of DCMI:
- Plain text
- Meta tags in HTML
- Metadata in RDF
The first style might be called the plain text encoding of the document metadata. In the online version, the following information is placed in an HTML table and given a distinctive background color, but it would be little affected if it were printed in a book or binder (or as formatted below). In particular, a non-electronic resource that uses DCMI necessarily uses something similar to:
DCMI Metadata Terms
Creator: DCMI Usage Board
Identifier: http://dublincore.org/documents/2004/06/14/dcmi-terms/
Date Issued: 2004-06-14
Latest Version: http://dublincore.org/documents/dcmi-terms/
Replaces: http://dublincore.org/documents/2003/11/19/dcmi-terms/
Translations: http://dublincore.org/resources/translations/
Document Status: This is a DCMI Recommendation.
Description: This document is an up-to-date specification of all metadata terms maintained by the Dublin Core Metadata Initiative, including elements, element refinements, encoding schemes, and vocabulary terms (the DCMI Type Vocabulary).
Date Valid: 2004-06-14
Each of the italicized field names is metadata about the document that might be attached; even though I do not reproduce the entire document here, notice that the Identifier field is a URI, where applicable, and lets you locate the connected document.
Several of the metadata fields given in the plain text header -- Creator, Identifier, and Description -- belong to DCMI's set of 15 basic elements. Other fields -- Replaces, Date Issued, and Date Valid -- are element refinements, which generally means that these elements inherit from base elements (however, it is not literally OOP-style inheritance). The remainder of the fields, however, do not seem to belong to DCMI, but are rather custom additions for this application; a different application that is not aware of these fields would typically just ignore them.
Plain text encodes DCMI metadata by typographic means somewhat specific to the work in question. In fact, many non-electronic works cannot really encode metadata directly. For example, musical works or paintings do not contain front matter or title pages where you might list these elements. Even written works that do not permit you to create new editions do not allow direct attachment of such plain text. Obviously, in cases like these the metadata has to exist in some attached or wrapping document. This could be literally on the wrapper of a work -- for example, shrink wrapping around an historical book edition, or in the packaging of a shipped painting.
Metadata attachment is a bit easier with electronic formats. Specifically, HTML
has a bit of a kludged tag that can live in its <head> element:
the <meta> element. The HTML version of DCMI Metadata Terms
encodes several base DCMI elements in just this manner. Listing 1 shows the whole <head> element:
Listing 1. Head of HTML version of DCMI Metadata Terms
<head> <title>DCMI Metadata Terms</title> <link rel="schema.DC" href="http://purl.org/dc/elements/1.1/" /> <meta name="DC.title" content="Dublin Core Metadata Terms" /> <meta name="DC.description" content="This document is an up-to-date specification of all metadata terms maintained by the Dublin Core Metadata Initiative, including elements, element refinements, encoding schemes, and vocabulary terms (the DCMI Type Vocabulary)." /> <meta name="DC.publisher" content="Dublin Core Metadata Initiative" /> <meta content="text/html; charset=iso-8859-1" http-equiv="Content-Type" /> <link href="index.shtml.rdf" rel="meta" /> <link type="text/css" href="/css/default.css" rel="stylesheet" /> </head> |
Note a few things in Listing 1. The regular <title> of an HTML
document is already a kind of metadata, but it's fairly impoverished since
it lacks additional accompanying terms. The HTML header gives a
<link> to schema.DC as a convention
for explicitly indicating the
use of DCMI terms in other <meta> tags. Of course, the HTML spec
itself, and most HTML processing applications (such as Web browsers), lack
any special knowledge of what to do with any of this -- but they should
ignore and preserve it gracefully.
The terms DC.title, DC.description, and DC.publisher are basic
elements from DCMI, and are pseudo-namespace qualified. The
publisher element was not given in the plain text version (but perhaps it
should have been). title was not explicitly labeled as a field,
but all of the DCMI documentation includes that field as the first thing
in a document, and in an <h1> tag; it is reasonable to treat that as
indicating the field title, despite the fact that it's marked differently than
other fields.
Like many HTML documents, DCMI Metadata Terms includes a non-DCMI
Content-Type metadata tag. Not all metadata is DCMI, so DCMI is
intended to play well with other external metadata tagging.
I haven't yet mentioned another element in the HTML document -- well, two
elements. The stylesheet link is an external
resource for the HTML that I do not need to comment on here, though
it might also be considered a kind of metadata -- one concerning best
presentation of the document. The more interesting external resource
is the <link> to index.shtml.rdf.
Look at Listing 2:
Listing 2. RDF resource linked to by DCMI Metadata Terms
<?xml version="1.0"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:dc="http://purl.org/dc/elements/1.1/">
<rdf:Description
rdf:about="http://dublincore.org/documents/dcmi-terms/">
<dc:title>Dublin Core Metadata Terms</dc:title>
<dc:description>This document is an up-to-date specification of all
metadata terms maintained by the Dublin Core Metadata Initiative,
including elements, element refinements, encoding schemes, and
vocabulary terms (the DCMI Type Vocabulary).</dc:description>
<dc:publisher>Dublin Core Metadata Initiative</dc:publisher>
</rdf:Description>
</rdf:RDF>
|
Probably the most common place you will find DCMI terms is embedded in RDF.
XML namespace support makes such embedding quite elegant:
DCMI terms can live in the dc: namespace; RDF itself in rdf: (or as the default namespace). This leaves open the option of
embedding more vocabularies, such as xhtml:.
The dc: namespace is not the only one recommended by DCMI, however.
The basic 15 elements of DCMI are normally given a dc:
namespace, but supplemental terms and refinements are generally
placed in the dcterms: namespace.
The placement of refinements in the dcterms: namespace is a revision of earlier recommendations that ancestor terms be qualified with the dc: namespace.
For example, the
RDF file for DCMI Metadata Terms might currently be enhanced with the element:
<dcterms:issued>2004-06-14</dcterms:issued> |
Of course, the <rdf:RDF> root element would need the additional
namespace specification xmlns:dcterms="http://purl.org/dc/terms/" to
make this work.
You might come across an older RDF file that has a qualified-ancestor element similar to:
<dc:date.issued>2004-06-14</dc:date.issued> |
I'm not certain why this usage was changed; at first glance, the older
usage appears more descriptive to me. But I have not followed the
discussion that went into this decision, and I assume there were good reasons for it.
Incidentally, you can also find a dcmitype: namespace at http://purl.org/dc/dcmitype/.
In this installment, I showed you how DCMI is used within XML in relation to RDF specifically. But DCMI is particularly well suited for embedding within XML generally. For all their tricks and difficulties (some of which have been pointed out by my colleague Uche Ogbuji -- see Resources), namespaces are a genuinely elegant means of combining XML vocabularies.
One significant advantage to embedding DCMI in XML -- rather than in HTML, as plain text, or in various wrappers of works -- is that DCMI metadata can annotate specific elements, not just whole documents.
For example, in some earlier installments, I took a look at DocBook/XML and used it to mark up a chapter of my doctoral dissertation. I might want to go back and annotate this document with metadata specific to its production. Many of the features apply to the document as a whole -- for example, I created the whole work. But other features might be specific to different sections -- for example, these sections were created on different dates, and they might replace different component articles when assembled.
As a quick example of section context, let me present a highly stripped down, but DCMI-annotated version of my DocBook chapter:
Listing 3. DCMI-annotated DocBook/XML dissertation chapter
<?xml version="1.0"?>
<chapter xmlns="http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:dcterms="http://purl.org/dc/terms/" >
<dc:creator>David Mertz</dc:creator>
<dc:identifier>http://gnosis.cx/dW/mertz/chap5.xml</dc:identifier>
<dc:title>Hegemony, and Other Passing Fads</dc:title>
<title>Hegemony, and Other Passing Fads</title>
<sect1>
<title>Forgotten AIDS Myths</title>
<dc:title>Forgotten AIDS Myths</dc:title>
<dc:date>1998-11</dc:date>
<dcterms:replaces>
http://gnosis.cx/dW/mertz/sex_wars.html</dcterms:replaces>
</sect1>
<sect1>
<title>Day-Care Devil Worshipers</title>
<dc:title>Day-Care Devil Worshipers</dc:title>
<dc:date>1998-08</dc:date>
<sect2><title>Remembering Events</title></sect2>
<sect2><title>Forgetting Everything</title></sect2>
<sect2><title>Motives, Right and Left</title></sect2>
<sect2><title>Flashpoints</title></sect2>
<sect2><title>Obtaining Outsidelessness</title></sect2>
<sect2><title>Remembrance of Ideologies Past</title></sect2>
</sect1>
<sect1>
<title>Tsars and Jihads</title>
<dc:date>1997-10</dc:date>
<dc:title>Tsars and Jihads</dc:title>
</sect1>
</chapter>
|
I only left in section headings, but you can see how DCMI terms can usefully annotate each section element as specific to that subdocument. Obviously, you could add terms other than the minimal examples I use here.
- Participate in the discussion forum.
- Get started with the Dublin Core Metadata Initiative by visiting their
homepage. You can read not only about documented recommendations,
but also about case studies, upcoming conferences, and how to
participate in the initiative's consensus process.
- Dig deeper into the initiative at the DCMI FAQ, which provides useful guidance for understanding the anticipated scope and usage of DCMI.
- Read the World Wide Web Consortium's Date and Time Formats page, which focuses primarily on the ISO 8601 standard.
- See The Thesaurus of Geographic Names (TGN).
- Start studying the DCMI vocabulary at the best place -- the document "DCMI Metadata Terms."
- Avoid the pitfalls of XML namespaces -- read Uche Ogbuji's developerWorks article,
"Use XML namespaces with care" (April 2004).
- Read David Mertz's DocBook/XML markup of a chapter of his doctoral dissertation. David has also written about DocBook in previous installments of XML Matters:
- "Getting started with the DocBook XML dialect" (October 2000)
- "Getting comfortable with the DocBook XML dialect" (October 2000)
- "Transforming DocBook documents using XSLT" (November 2000)
- Find hundreds more XML resources on the
developerWorks XML technology zone.
- Find all previous installments of David's XML Matters column on the column summary page.
- Browse for books on these and other technical topics.
- Learn how you can become an IBM Certified Developer in XML and related technologies.

To David Mertz, all the world is a stage, and his career is devoted to providing marginal staging instructions. David may be reached at mertz@gnosis.cx; his life pored over at http://gnosis.cx/dW/. Suggestions and recommendations on this, past, or future columns are welcomed. Check out David's book Text Processing in Python.