Skip to main content

If you don't have an IBM ID and password, register here.

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. This profile includes the first name, last name, and display name you identified when you registered with developerWorks. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

All information submitted is secure.

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

Thinking XML: Shedding light on PRISM

A standard metadata vocabulary for publishing

Uche Ogbuji (uche@ogbuji.net), Principal Consultant, Fourthought, Inc.
Photo of Uche Ogbuji
Uche Ogbuji is a consultant and co-founder of Fourthought Inc., a software vendor and consultancy specializing in XML solutions for enterprise knowledge management. Fourthought develops 4Suite, an open source platform for XML, RDF, and knowledge-management applications. Mr. Ogbuji is a Computer Engineer and writer born in Nigeria, living and working in Boulder, Colorado, USA. You can contact Mr. Ogbuji at uche@ogbuji.net.

Summary:  PRISM is a standard for metadata related to publishing. It allows the formal description of content and related resources by providing standardized properties, controlled vocabularies, and extensibility mechanisms that enable users to define their own controlled vocabularies. In this column, Uche Ogbuji introduces PRISM by example.

Date:  01 Oct 2002
Level:  Introductory

Comments:  

The various industries related to publishing were among the earliest to support XML and to explore its value in practice. This is not surprising as the publishing industry has been been a stalwart of SGML, the parent of XML. The Information and Content Exchange protocol, or ICE, emerged in 1998 as one of the earliest major industry standards to use XML. ICE is a protocol for directing the distribution of content electronically to various partners presenting the content on the Internet. XML is well-suited to another important requirement in the publishing industry: content metadata management. ICE provides the mechanism for exchanging content, but even the ICE specification admits that there needs to be a formal means for describing that content.

To meet this need, the publishing industry has developed Publishing Requirements for Industry Standard Metadata (PRISM), an XML metadata standard for directing the processing of content. PRISM covers a wide variety of content, from catalogs to books -- and a wide variety of media, from various forms of electronic publishing to various forms of print. PRISM is being developed by a working group of IDEAlliance (formerly known as GCA), a consortium of publishers involved with electronic technological infrastructure. PRISM members include technology vendors such as Adobe, and magazine publishers such as Time Inc. and McGraw-Hill.

In this article, I introduce PRISM, focusing on the current draft of the PRISM 1.2 specification. Readers should be familiar with XML and RDF.

Building from the basics

PRISM, at its most basic, is defined as an RDF/XML document that uses the Dublin Core vocabulary. As an example, Listing 1 is a valid PRISM document that describes the previous installment of this column.


Listing 1. Thinking XML column 12 described formally in rudimentary PRISM
<?xml version="1.0" encoding="UTF-8"?> 
<rdf:RDF
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  xmlns:dc="http://purl.org/dc/elements/1.1/"
  xml:lang="en"
>
 <rdf:Description
    rdf:about="http://www.ibm.com/developerworks/xml/library/x-think12.html">
  <dc:description>
    A discussion of the broader context and relevance of XML/RDF techniques.
  </dc:description> 
  <dc:title>Basic XML and RDF techniques for knowledge management, Part 7
  </dc:title>
  <dc:publisher>IBM developerWorks</dc:publisher>
  <dc:creator>Uche Ogbuji</dc:creator>
  <dc:subject>XML</dc:subject>
  <dc:subject>RDF</dc:subject>
  <dc:format>text/html</dc:format>
 </rdf:Description>
</rdf:RDF>
 

This correspondence with plain RDF and the increasingly popular Dublin Core element set means that PRISM is well aligned with the large body of RDF tools and techniques that are in place. As far as Dublin Core properties are concerned, PRISM allows flexibility in the values of the properties: You can use ad hoc plain text values as I do above; you can use plain text values from a controlled vocabulary, such as ISBN numbers; or you can use URIs. For example, I express the dc:publisher property above as:

   <dc:publisher>IBM developerWorks</dc:publisher> 

I could express it using the International Standard Serial Number (ISSN) for IBM developerWorks (actually, this is a made-up ISSN):

   <dc:publisher>1234-5678</dc:publisher> 

I could also use the ISSN key title for IBM developerWorks. The key title is a special name that is assigned along with the ISSN. The key title is generally a variant of the general name of the publication, which is modified to make it globally unique. The ISSN key title and number are vocabularies controlled by the ISSN International Centre. Using them in the metadata field removes any possible ambiguity associated with using the common name of the publisher.

As a third option, I could use the IBM developerWorks URL:

   <dc:publisher rdf:resource="http://www.ibm.com/developerworks"/> 

Notice the different syntactic form of the RDF, which specifies that the property value is another resource and not just a plain text string. This latter option is also a controlled vocabulary, but rather than control being established by a single body, it is established by virtue of IBM's ownership of the domain name used in the URL, as well as the Internet addresses of the machines mapped to this domain name.

One important point is that PRISM is based on the RDF/XML serialization rather than the abstract model. In the last installment of this column, I strongly recommended that users of RDF focus on the abstract model rather than the XML serialization. I can understand PRISM's contradiction of this because it has to address content providers and thus tell them concretely what XML element to put together in metadata communications. PRISM aims to establish strong interoperability at the syntax level. To underscore this, PRISM is formally defined in terms of DTD, which is also probably a natural consequence of PRISM's publishing origins.

One downside of the focus on syntax is that the RDF/XML serialization cannot express every RDF model. For example, if an organization were to identify content using a URI form that cannot be broken into an XML prefix/local name combination, then it's hard to see how it could use PRISM to describe such content. On the positive side, PRISM takes advantage of the flexibility of the RDF/XML specification in being able to appear either in stand-alone documents, or embedded within the content. As usual, the rdf:RDF wrapper element provides an encapsulation of the metadata.


A bit of added color

PRISM also defines a set of properties that extend the basic descriptions allowed by Dublin core. These properties support:

  • Description of general characteristics
  • Provenance of content
  • Important dates and times related to the publishing
  • Subjects and topics of the content
  • Relationships between resources
  • Rights and permissions that govern use of the content

All of these properties are based on the core PRISM namespace, http://prismstandard.org/namespaces/1.2/basic/, which is formally defined in section II 4.4 of the spec. One warning: examples in the draft PRISM spec itself are inconsistent in their definition of the PRISM namespace. Some use the normative namespace, but some unaccountably use variations such as http://prismstandard.org/namespaces/basic/ and even http://prismstandard.org/namespaces/basic/1.2/. I assume these are just typos.

In the following listing, I select some of the more interesting PRISM properties from the 50 or so defined in the specification.

  • prism:category: The nature or genre of the content. PRISM provides a recommended controlled vocabulary for this which includes terms such as advertisement, cartoon, column, and recipe.
  • prism:creationTime and prism:modificationTime: Pertinent dates in the life cycle of the content.
  • prism:event: An event referred to in or described by the content.
  • prism:location: A place referred to in or described by the content.
  • prism:person and prism:organization: A person or organization referred to in or described by the content.
  • prism:isPartOf: A resource that is either a physical or logical part of the one being described. prism:hasPart is the inverse relationship.
  • prism:isFormatOf: A resource that is either a variant of the one being described. prism:hasFormat is the inverse relationship.This could, for example, relate printed and electronic formats of a resource.
  • prism:isReferencedBy: A resource that either references the one being described (for instance through citation). prism:references is the inverse relationship.
  • prism:isTranslationOf: A resource that is either a language translation of the one being described. prism:hasTranslation is the inverse relationship.
  • prism:copyright: A copyright notice for the content.

Listing 2 is an example using these PRISM core elements to expand on Listing 1.


Listing 2: Thinking XML column 12 described formally in PRISM using core elements
<?xml version="1.0" encoding="UTF-8"?> 
<rdf:RDF
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  xmlns:dc="http://purl.org/dc/elements/1.1/"

  xmlns:prism="http://prismstandard.org/namespaces/1.2/basic/"

  xml:lang="en"
>
 <rdf:Description
    rdf:about="http://www.ibm.com/developerworks/xml/library/x-think12.html">
  <dc:description>
    A discussion of the broader context and relevance of XML/RDF techniques.
  </dc:description> 
  <dc:title>Basic XML and RDF techniques for knowledge management, Part 7
  </dc:title>
  <dc:publisher>IBM developerWorks</dc:publisher>
  <dc:creator>Uche Ogbuji</dc:creator>
  <dc:subject>XML</dc:subject>
  <dc:subject>RDF</dc:subject>


  <prism:category>column</prism:category>
  <prism:organization>OMG</prism:organization>
		

  <dc:format>text/html</dc:format>
 </rdf:Description>
</rdf:RDF>

The added material is in boldface. I declared the PRISM namespace and then added statements from this namespace.


Custom controlled vocabularies

An important provision of PRISM is a formal way for others to define their own controlled vocabularies. In this way, PRISM provides a mechanism for extensibility that goes beyond the basic extensibility of XML and RDF. If you look at my description in Listings 1 and 2, you will notice my use of the dc:subject property with the simple values XML and RDF. But this could be ambiguous because these do not come from a controlled vocabulary. For instance, someone coming from the mining industry might misunderstand RDF, which is also a common abbreviation for "refuse defined fuels" in that industry. What I really mean to say here is that the content in question is about a particular pair of W3C specifications. But PRISM does not define a controlled vocabulary of industry specifications. I shall instead use PRISM's facilities to define my own such vocabulary, in Listing 3.


Listing 3: An example controlled vocabulary of formal specifications
  <?xml version="1.0" encoding="UTF-8"?>
  <rdf:RDF
     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
     xmlns:dc="http://purl.org/dc/elements/1.1/"
     xmlns:pcv="http://prismstandard.org/namespaces/1.2/pcv/"
     xmlns:u="http://uche.ogbuji.net/eg/pcv/specs/schema/"
     xml:lang="en" >
  <pcv:Descriptor rdf:about="http://www.w3.org/TR/REC-xml">
     <pcv:label>XML 1.0 Recommendation</pcv:label>
     <u:owner rdf:resource="http://w3.org"/>
  </pcv:Descriptor>
  <pcv:Descriptor rdf:about="http://www.w3.org/TR/REC-rdf-syntax/">
     <pcv:label>RDF Model and Syntax 1.0</pcv:label>
     <u:owner rdf:resource="http://w3.org"/>
  </pcv:Descriptor>
  </rdf:RDF> 

Here I define two resources of type pcv:Descriptor, using the URLs of the specifications themselves as the IDs. I use pcv:label, which is a subclass of rdfs:label, to set a human-readable description of the resource suitable for use in PRISM-aware software. And finally, I take advantage of the general extensibility of RDF itself to create my own specialized property, u:owner, tying the specification to the organization that owns it. I can now use this controlled vocabulary to make a more refined statement than my original dc:subject. Listing 4 is an excerpt from Listing 1 which shows the modified subject statements.


Listing 4: Updated dc:subject to use controlled vocabulary
  <dc:subject>
    <pcv:Descriptor rdf:about="http://www.w3.org/TR/REC-xml"/>
  </dc:subject>
  <dc:subject>
    <pcv:Descriptor rdf:about="http://www.w3.org/TR/REC-rdf-syntax/"/>
  </dc:subject>

If I use this form, I rely on the software processing the PRISM to find the document in Listing 4 with the controlled vocabulary to determine such useful things as the labels of the descriptor resources. Because this might not always be available, I can use PRISM to take advantage of RDF's syntax rules to repeat such properties in line, as in Listing 5.


Listing 5: Updated dc:subject to use controlled vocabulary in Listing 4, repeating label property
   
 <dc:subject>
    <pcv:Descriptor rdf:about="http://www.w3.org/TR/REC-xml">
      <pcv:label>XML 1.0 Recommendation</pcv:label>
    </pcv:Descriptor>
  </dc:subject>
  <dc:subject>
    <pcv:Descriptor rdf:about="http://www.w3.org/TR/REC-rdf-syntax/">
    <pcv:label>RDF Model and Syntax 1.0</pcv:label>
    </pcv:Descriptor>
  </dc:subject>
 

The danger of this approach is that the in-line labels could get out of sync with the values in the external controlled vocabulary document.


The PRISM outdoors

PRISM has been in development for a while, and has matured rather well. The PRISM working group has seen steady growth in its membership, and cites a growing body of success stories of PRISM in production. I have used PRISM in projects not directly related to publishing because of how it rounds out some of the basic Dublin Core properties. It is surprisingly useful in database projects, especially for integrating data sets from traditional databases into XML document systems.


Resources

About the author

Photo of Uche Ogbuji

Uche Ogbuji is a consultant and co-founder of Fourthought Inc., a software vendor and consultancy specializing in XML solutions for enterprise knowledge management. Fourthought develops 4Suite, an open source platform for XML, RDF, and knowledge-management applications. Mr. Ogbuji is a Computer Engineer and writer born in Nigeria, living and working in Boulder, Colorado, USA. You can contact Mr. Ogbuji at uche@ogbuji.net.

Report abuse help

Report abuse

Thank you. This entry has been flagged for moderator attention.


Report abuse help

Report abuse

Report abuse submission failed. Please try again later.


developerWorks: Sign in

If you don't have an IBM ID and password, register here.


Forgot your IBM ID?


Forgot your password?
Change your password


By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. This profile includes the first name, last name, and display name you identified when you registered with developerWorks. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

Choose your display name

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

(Must be between 3 – 31 characters.)


By clicking Submit, you agree to the developerWorks terms of use.

 


Rate this article

Comments

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=XML
ArticleID=12171
ArticleTitle=Thinking XML: Shedding light on PRISM
publish-date=10012002
author1-email=uche@ogbuji.net
author1-email-cc=

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

For articles in technology zones (such as Java technology, Linux, Open source, XML), Popular tags shows the top tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), Popular tags shows the top tags for just that product zone.

For articles in technology zones (such as Java technology, Linux, Open source, XML), My tags shows your tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), My tags shows your tags for just that product zone.

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).