Skip to main content

If you don't have an IBM ID and password, register here.

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. This profile includes the first name, last name, and display name you identified when you registered with developerWorks. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

All information submitted is secure.

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

XML Matters: Describe XML content with the Dublin Core Metadata Initiative

Reuse metadata in broader XML vocabularies

David Mertz (mertz@gnosis.cx), Metaphilosopher, Gnosis Software, Inc.
David Mertz
To David Mertz, all the world is a stage, and his career is devoted to providing marginal staging instructions. David may be reached at mertz@gnosis.cx; his life pored over at http://gnosis.cx/dW/. Suggestions and recommendations on this, past, or future columns are welcomed. Check out David's book Text Processing in Python.

Summary:  The Dublin Core Metadata Initiative (DCMI) is a standardized vocabulary for handling information about documents. In general, the DCMI vocabulary defines a hierarchy of terms that describe the purpose, context, and origin of a document (rather than describing the document itself). David shows you how DCMI provides a set of metadata primitives that you can reuse (through namespaces) in broader XML vocabularies, such as RSS variants. Various standards, including those from ISO and NISO, have adopted parts of DCMI.

View more content in this series

Date:  06 Aug 2004
Level:  Intermediate

Comments:  

I'll begin with one caveat: The Dublin Core Metadata Initiative does not really have anything to do with XML. The most widespread use of DCMI is, indeed, probably within namespace-enhanced XML documents; but nothing about metadata generally -- or this collection of elements specifically -- requires that the underlying data be encoded as XML. Instead, DCMI is a generic framework for describing a broadly useful collection of information about documents of all sorts. The individual documents that are characterized using DCMI might be encoded in XML or in most any other electronic or physical format, and their subject matter can be pretty much any endeavor of human creation.

DCMI is a vocabulary for talking about documents, with (relatively) well-defined semantics for the meaning and usage of its terms. The terms included in DCMI are divided into a minimal set of base elements and an optional collection of refinements to these base elements.

Much of the benefit of DCMI comes simply from standardizing the way metadata terms are spelled, and the format of the values these terms will take. For example, you might identify a work by near-synonyms like "author", "artist", "originator", "maker" or "creator"; DCMI standardizes the name of this role with the last term, "creator", in order to provide a consistent method of comparing documents that may share authorship. Naturally, the names of persons and organizations who might be creators can be pretty much anything; in comparing creators to each other, an application of DCMI might wish to further standardize the format of names (for instance, "Lastname, Firstname") beyond what the DCMI recommendation provides.

In addition to standardizing metadata terms, DCMI provides recommendations for choosing values, either by enumeration or specification of patterns. For example, the term "date" is a rather obvious choice of metadata term, but dates come in multiple formats. DCMI recommends that dates be given in the ISO 8601 subset specified in the W3C Date and Time Formats note (see Resources). In other cases, such as "coverage" -- which is defined as "the extent or scope of the content of the resource" -- the DCMI recommends using names from the (large, but finite) enumeration in the Thesaurus of Geographic Names (see Resources).

Describing documents

For an example of the concrete use of DCMI metadata, look at the document "DCMI Metadata Terms" (see Resources), a presumably well-thought-out instantiation of the DCMI's own principles. Incidentally, notice that DCMI vocabulary terms are not case-sensitive, since they will often be used in case-insensitive contexts such as HTML (pre-XHTML, that is).

The "DCMI Metadata Terms" document encodes its metadata in several distinct ways, at least in the HTML version. This redundancy is useful in that it shows off each of the three most important encoding styles you are likely to find in the use of DCMI:

  • Plain text
  • Meta tags in HTML
  • Metadata in RDF

Plain text

The first style might be called the plain text encoding of the document metadata. In the online version, the following information is placed in an HTML table and given a distinctive background color, but it would be little affected if it were printed in a book or binder (or as formatted below). In particular, a non-electronic resource that uses DCMI necessarily uses something similar to:

DCMI Metadata Terms
Creator: DCMI Usage Board
Identifier: http://dublincore.org/documents/2004/06/14/dcmi-terms/
Date Issued: 2004-06-14
Latest Version: http://dublincore.org/documents/dcmi-terms/
Replaces: http://dublincore.org/documents/2003/11/19/dcmi-terms/
Translations: http://dublincore.org/resources/translations/
Document Status: This is a DCMI Recommendation.
Description: This document is an up-to-date specification of all metadata terms maintained by the Dublin Core Metadata Initiative, including elements, element refinements, encoding schemes, and vocabulary terms (the DCMI Type Vocabulary).
Date Valid: 2004-06-14

Each of the italicized field names is metadata about the document that might be attached; even though I do not reproduce the entire document here, notice that the Identifier field is a URI, where applicable, and lets you locate the connected document.

Several of the metadata fields given in the plain text header -- Creator, Identifier, and Description -- belong to DCMI's set of 15 basic elements. Other fields -- Replaces, Date Issued, and Date Valid -- are element refinements, which generally means that these elements inherit from base elements (however, it is not literally OOP-style inheritance). The remainder of the fields, however, do not seem to belong to DCMI, but are rather custom additions for this application; a different application that is not aware of these fields would typically just ignore them.

Meta tags in HTML

Plain text encodes DCMI metadata by typographic means somewhat specific to the work in question. In fact, many non-electronic works cannot really encode metadata directly. For example, musical works or paintings do not contain front matter or title pages where you might list these elements. Even written works that do not permit you to create new editions do not allow direct attachment of such plain text. Obviously, in cases like these the metadata has to exist in some attached or wrapping document. This could be literally on the wrapper of a work -- for example, shrink wrapping around an historical book edition, or in the packaging of a shipped painting.

Metadata attachment is a bit easier with electronic formats. Specifically, HTML has a bit of a kludged tag that can live in its <head> element: the <meta> element. The HTML version of DCMI Metadata Terms encodes several base DCMI elements in just this manner. Listing 1 shows the whole <head> element:


Listing 1. Head of HTML version of DCMI Metadata Terms
<head>
<title>DCMI Metadata Terms</title>
<link rel="schema.DC" href="http://purl.org/dc/elements/1.1/" />
<meta name="DC.title" content="Dublin Core Metadata Terms" />
<meta name="DC.description" content="This document is an up-to-date
  specification of all metadata terms maintained by the Dublin Core
  Metadata Initiative, including elements, element refinements,
  encoding schemes, and vocabulary terms (the DCMI Type Vocabulary)." />
<meta name="DC.publisher" content="Dublin Core Metadata Initiative" />
<meta content="text/html; charset=iso-8859-1" http-equiv="Content-Type" />
<link href="index.shtml.rdf" rel="meta" />
<link type="text/css" href="/css/default.css" rel="stylesheet" />
</head>

Note a few things in Listing 1. The regular <title> of an HTML document is already a kind of metadata, but it's fairly impoverished since it lacks additional accompanying terms. The HTML header gives a <link> to schema.DC as a convention for explicitly indicating the use of DCMI terms in other <meta> tags. Of course, the HTML spec itself, and most HTML processing applications (such as Web browsers), lack any special knowledge of what to do with any of this -- but they should ignore and preserve it gracefully.

The terms DC.title, DC.description, and DC.publisher are basic elements from DCMI, and are pseudo-namespace qualified. The publisher element was not given in the plain text version (but perhaps it should have been). title was not explicitly labeled as a field, but all of the DCMI documentation includes that field as the first thing in a document, and in an <h1> tag; it is reasonable to treat that as indicating the field title, despite the fact that it's marked differently than other fields.

Like many HTML documents, DCMI Metadata Terms includes a non-DCMI Content-Type metadata tag. Not all metadata is DCMI, so DCMI is intended to play well with other external metadata tagging.

Metadata in RDF

I haven't yet mentioned another element in the HTML document -- well, two elements. The stylesheet link is an external resource for the HTML that I do not need to comment on here, though it might also be considered a kind of metadata -- one concerning best presentation of the document. The more interesting external resource is the <link> to index.shtml.rdf. Look at Listing 2:


Listing 2. RDF resource linked to by DCMI Metadata Terms
<?xml version="1.0"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns:dc="http://purl.org/dc/elements/1.1/">
<rdf:Description
     rdf:about="http://dublincore.org/documents/dcmi-terms/">
<dc:title>Dublin Core Metadata Terms</dc:title>
<dc:description>This document is an up-to-date specification of all
  metadata terms maintained by the Dublin Core Metadata Initiative,
  including elements, element refinements, encoding schemes, and
  vocabulary terms (the DCMI Type Vocabulary).</dc:description>
<dc:publisher>Dublin Core Metadata Initiative</dc:publisher>
</rdf:Description>
</rdf:RDF>

Probably the most common place you will find DCMI terms is embedded in RDF. XML namespace support makes such embedding quite elegant: DCMI terms can live in the dc: namespace; RDF itself in rdf: (or as the default namespace). This leaves open the option of embedding more vocabularies, such as xhtml:.

The dc: namespace is not the only one recommended by DCMI, however. The basic 15 elements of DCMI are normally given a dc: namespace, but supplemental terms and refinements are generally placed in the dcterms: namespace. The placement of refinements in the dcterms: namespace is a revision of earlier recommendations that ancestor terms be qualified with the dc: namespace. For example, the RDF file for DCMI Metadata Terms might currently be enhanced with the element:

<dcterms:issued>2004-06-14</dcterms:issued>

Of course, the <rdf:RDF> root element would need the additional namespace specification xmlns:dcterms="http://purl.org/dc/terms/" to make this work.

You might come across an older RDF file that has a qualified-ancestor element similar to:

<dc:date.issued>2004-06-14</dc:date.issued>

I'm not certain why this usage was changed; at first glance, the older usage appears more descriptive to me. But I have not followed the discussion that went into this decision, and I assume there were good reasons for it. Incidentally, you can also find a dcmitype: namespace at http://purl.org/dc/dcmitype/.


Conclusion: General XML usage

In this installment, I showed you how DCMI is used within XML in relation to RDF specifically. But DCMI is particularly well suited for embedding within XML generally. For all their tricks and difficulties (some of which have been pointed out by my colleague Uche Ogbuji -- see Resources), namespaces are a genuinely elegant means of combining XML vocabularies.

One significant advantage to embedding DCMI in XML -- rather than in HTML, as plain text, or in various wrappers of works -- is that DCMI metadata can annotate specific elements, not just whole documents.

For example, in some earlier installments, I took a look at DocBook/XML and used it to mark up a chapter of my doctoral dissertation. I might want to go back and annotate this document with metadata specific to its production. Many of the features apply to the document as a whole -- for example, I created the whole work. But other features might be specific to different sections -- for example, these sections were created on different dates, and they might replace different component articles when assembled.

As a quick example of section context, let me present a highly stripped down, but DCMI-annotated version of my DocBook chapter:


Listing 3. DCMI-annotated DocBook/XML dissertation chapter
<?xml version="1.0"?>
<chapter xmlns="http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd"
         xmlns:dc="http://purl.org/dc/elements/1.1/"
         xmlns:dcterms="http://purl.org/dc/terms/" >
  <dc:creator>David Mertz</dc:creator>
  <dc:identifier>http://gnosis.cx/dW/mertz/chap5.xml</dc:identifier>
  <dc:title>Hegemony, and Other Passing Fads</dc:title>
  <title>Hegemony, and Other Passing Fads</title>
  <sect1>
    <title>Forgotten AIDS Myths</title>
    <dc:title>Forgotten AIDS Myths</dc:title>
    <dc:date>1998-11</dc:date>
    <dcterms:replaces>
        http://gnosis.cx/dW/mertz/sex_wars.html</dcterms:replaces>
  </sect1>
  <sect1>
    <title>Day-Care Devil Worshipers</title>
    <dc:title>Day-Care Devil Worshipers</dc:title>
    <dc:date>1998-08</dc:date>
    <sect2><title>Remembering Events</title></sect2>
    <sect2><title>Forgetting Everything</title></sect2>
    <sect2><title>Motives, Right and Left</title></sect2>
    <sect2><title>Flashpoints</title></sect2>
    <sect2><title>Obtaining Outsidelessness</title></sect2>
    <sect2><title>Remembrance of Ideologies Past</title></sect2>
  </sect1>
  <sect1>
    <title>Tsars and Jihads</title>
    <dc:date>1997-10</dc:date>
    <dc:title>Tsars and Jihads</dc:title>
  </sect1>
</chapter>

I only left in section headings, but you can see how DCMI terms can usefully annotate each section element as specific to that subdocument. Obviously, you could add terms other than the minimal examples I use here.


Resources

About the author

David Mertz

To David Mertz, all the world is a stage, and his career is devoted to providing marginal staging instructions. David may be reached at mertz@gnosis.cx; his life pored over at http://gnosis.cx/dW/. Suggestions and recommendations on this, past, or future columns are welcomed. Check out David's book Text Processing in Python.

Report abuse help

Report abuse

Thank you. This entry has been flagged for moderator attention.


Report abuse help

Report abuse

Report abuse submission failed. Please try again later.


developerWorks: Sign in

If you don't have an IBM ID and password, register here.


Forgot your IBM ID?


Forgot your password?
Change your password


By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. This profile includes the first name, last name, and display name you identified when you registered with developerWorks. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

Choose your display name

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

(Must be between 3 – 31 characters.)


By clicking Submit, you agree to the developerWorks terms of use.

 


Rate this article

Comments

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=XML
ArticleID=15036
ArticleTitle=XML Matters: Describe XML content with the Dublin Core Metadata Initiative
publish-date=08062004
author1-email=mertz@gnosis.cx
author1-email-cc=dwxed@us.ibm.com

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

For articles in technology zones (such as Java technology, Linux, Open source, XML), Popular tags shows the top tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), Popular tags shows the top tags for just that product zone.

For articles in technology zones (such as Java technology, Linux, Open source, XML), My tags shows your tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), My tags shows your tags for just that product zone.

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).