The various industries related to publishing were among the earliest to support XML and to explore its value in practice. This is not surprising as the publishing industry has been been a stalwart of SGML, the parent of XML. The Information and Content Exchange protocol, or ICE, emerged in 1998 as one of the earliest major industry standards to use XML. ICE is a protocol for directing the distribution of content electronically to various partners presenting the content on the Internet. XML is well-suited to another important requirement in the publishing industry: content metadata management. ICE provides the mechanism for exchanging content, but even the ICE specification admits that there needs to be a formal means for describing that content.
To meet this need, the publishing industry has developed Publishing Requirements for Industry Standard Metadata (PRISM), an XML metadata standard for directing the processing of content. PRISM covers a wide variety of content, from catalogs to books -- and a wide variety of media, from various forms of electronic publishing to various forms of print. PRISM is being developed by a working group of IDEAlliance (formerly known as GCA), a consortium of publishers involved with electronic technological infrastructure. PRISM members include technology vendors such as Adobe, and magazine publishers such as Time Inc. and McGraw-Hill.
In this article, I introduce PRISM, focusing on the current draft of the PRISM 1.2 specification. Readers should be familiar with XML and RDF.
PRISM, at its most basic, is defined as an RDF/XML document that uses the Dublin Core vocabulary. As an example, Listing 1 is a valid PRISM document that describes the previous installment of this column.
Listing 1. Thinking XML column 12 described formally in rudimentary PRISM
<?xml version="1.0" encoding="UTF-8"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" xml:lang="en" > <rdf:Description rdf:about="http://www.ibm.com/developerworks/xml/library/x-think12.html"> <dc:description> A discussion of the broader context and relevance of XML/RDF techniques. </dc:description> <dc:title>Basic XML and RDF techniques for knowledge management, Part 7 </dc:title> <dc:publisher>IBM developerWorks</dc:publisher> <dc:creator>Uche Ogbuji</dc:creator> <dc:subject>XML</dc:subject> <dc:subject>RDF</dc:subject> <dc:format>text/html</dc:format> </rdf:Description> </rdf:RDF>
This correspondence with plain RDF and the increasingly popular Dublin Core element set means that PRISM is well aligned with the large body of RDF tools and techniques that are in place. As far as Dublin Core properties are concerned, PRISM allows flexibility in the values of the properties: You can use ad hoc plain text values as I do above; you can use plain text values from a controlled vocabulary, such as ISBN numbers; or you can use URIs. For example, I express the
dc:publisher property above as:
I could express it using the International Standard Serial Number (ISSN) for IBM developerWorks (actually, this is a made-up ISSN):
I could also use the ISSN key title for IBM developerWorks. The key title is a special name that is assigned along with the ISSN. The key title is generally a variant of the general name of the publication, which is modified to make it globally unique. The ISSN key title and number are vocabularies controlled by the ISSN International Centre. Using them in the metadata field removes any possible ambiguity associated with using the common name of the publisher.
As a third option, I could use the IBM developerWorks URL:
Notice the different syntactic form of the RDF, which specifies that the property value is another resource and not just a plain text string. This latter option is also a controlled vocabulary, but rather than control being established by a single body, it is established by virtue of IBM's ownership of the domain name used in the URL, as well as the Internet addresses of the machines mapped to this domain name.
One important point is that PRISM is based on the RDF/XML serialization rather than the abstract model. In the last installment of this column, I strongly recommended that users of RDF focus on the abstract model rather than the XML serialization. I can understand PRISM's contradiction of this because it has to address content providers and thus tell them concretely what XML element to put together in metadata communications. PRISM aims to establish strong interoperability at the syntax level. To underscore this, PRISM is formally defined in terms of DTD, which is also probably a natural consequence of PRISM's publishing origins.
One downside of the focus on syntax is that the RDF/XML serialization cannot express every RDF model. For example, if an organization were to identify content using a URI form that cannot be broken into an XML prefix/local name combination, then it's hard to see how it could use PRISM to describe such content. On the positive side, PRISM takes advantage of the flexibility of the RDF/XML specification in being able to appear either in stand-alone documents, or embedded within the content. As usual, the
rdf:RDF wrapper element provides an encapsulation of the metadata.
PRISM also defines a set of properties that extend the basic descriptions allowed by Dublin core. These properties support:
- Description of general characteristics
- Provenance of content
- Important dates and times related to the publishing
- Subjects and topics of the content
- Relationships between resources
- Rights and permissions that govern use of the content
All of these properties are based on the core PRISM namespace,
http://prismstandard.org/namespaces/1.2/basic/, which is formally defined in section II 4.4 of the spec. One warning: examples in the draft PRISM spec itself are inconsistent in their definition of the PRISM namespace. Some use the normative namespace, but some unaccountably use variations such as
http://prismstandard.org/namespaces/basic/ and even
http://prismstandard.org/namespaces/basic/1.2/. I assume these are just typos.
In the following listing, I select some of the more interesting PRISM properties from the 50 or so defined in the specification.
- prism:category: The nature or genre of the content. PRISM provides a recommended controlled vocabulary for this which includes terms such as advertisement, cartoon, column, and recipe.
- prism:creationTime and prism:modificationTime: Pertinent dates in the life cycle of the content.
- prism:event: An event referred to in or described by the content.
- prism:location: A place referred to in or described by the content.
- prism:person and prism:organization: A person or organization referred to in or described by the content.
- prism:isPartOf: A resource that is either a physical or logical part of the one being described. prism:hasPart is the inverse relationship.
- prism:isFormatOf: A resource that is either a variant of the one being described. prism:hasFormat is the inverse relationship.This could, for example, relate printed and electronic formats of a resource.
- prism:isReferencedBy: A resource that either references the one being described (for instance through citation). prism:references is the inverse relationship.
- prism:isTranslationOf: A resource that is either a language translation of the one being described. prism:hasTranslation is the inverse relationship.
- prism:copyright: A copyright notice for the content.
Listing 2: Thinking XML column 12 described formally in PRISM using core elements
<?xml version="1.0" encoding="UTF-8"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:prism="http://prismstandard.org/namespaces/1.2/basic/" xml:lang="en" > <rdf:Description rdf:about="http://www.ibm.com/developerworks/xml/library/x-think12.html"> <dc:description> A discussion of the broader context and relevance of XML/RDF techniques. </dc:description> <dc:title>Basic XML and RDF techniques for knowledge management, Part 7 </dc:title> <dc:publisher>IBM developerWorks</dc:publisher> <dc:creator>Uche Ogbuji</dc:creator> <dc:subject>XML</dc:subject> <dc:subject>RDF</dc:subject> <prism:category>column</prism:category> <prism:organization>OMG</prism:organization> <dc:format>text/html</dc:format> </rdf:Description> </rdf:RDF>
The added material is in boldface. I declared the PRISM namespace and then added statements from this namespace.
An important provision of PRISM is a formal way for others to define their own controlled vocabularies. In this way, PRISM provides a mechanism for extensibility that goes beyond the basic extensibility of XML and RDF. If you look at my description in Listings 1 and 2, you will notice my use of the
dc:subject property with the simple values
RDF. But this could be ambiguous because these do not come from a controlled vocabulary. For instance, someone coming from the mining industry might misunderstand RDF, which is also a common abbreviation for "refuse defined fuels" in that industry. What I really mean to say here is that the content in question is about a particular pair of W3C specifications. But PRISM does not define a controlled vocabulary of industry specifications. I shall instead use PRISM's facilities to define my own such vocabulary, in Listing 3.
Listing 3: An example controlled vocabulary of formal specifications
<?xml version="1.0" encoding="UTF-8"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:pcv="http://prismstandard.org/namespaces/1.2/pcv/" xmlns:u="http://uche.ogbuji.net/eg/pcv/specs/schema/" xml:lang="en" > <pcv:Descriptor rdf:about="http://www.w3.org/TR/REC-xml"> <pcv:label>XML 1.0 Recommendation</pcv:label> <u:owner rdf:resource="http://w3.org"/> </pcv:Descriptor> <pcv:Descriptor rdf:about="http://www.w3.org/TR/REC-rdf-syntax/"> <pcv:label>RDF Model and Syntax 1.0</pcv:label> <u:owner rdf:resource="http://w3.org"/> </pcv:Descriptor> </rdf:RDF>
Here I define two resources of type
pcv:Descriptor, using the URLs of the specifications themselves as the IDs. I use
pcv:label, which is a subclass of
rdfs:label, to set a human-readable description of the resource suitable for use in PRISM-aware software. And finally, I take advantage of the general extensibility of RDF itself to create my own specialized property,
u:owner, tying the specification to the organization that owns it. I can now use this controlled vocabulary to make a more refined statement than my original
dc:subject. Listing 4 is an excerpt from Listing 1 which shows the modified subject statements.
Listing 4: Updated dc:subject to use controlled vocabulary
<dc:subject> <pcv:Descriptor rdf:about="http://www.w3.org/TR/REC-xml"/> </dc:subject> <dc:subject> <pcv:Descriptor rdf:about="http://www.w3.org/TR/REC-rdf-syntax/"/> </dc:subject>
If I use this form, I rely on the software processing the PRISM to find the document in Listing 4 with the controlled vocabulary to determine such useful things as the labels of the descriptor resources. Because this might not always be available, I can use PRISM to take advantage of RDF's syntax rules to repeat such properties in line, as in Listing 5.
Listing 5: Updated dc:subject to use controlled vocabulary in Listing 4, repeating label property
<dc:subject> <pcv:Descriptor rdf:about="http://www.w3.org/TR/REC-xml"> <pcv:label>XML 1.0 Recommendation</pcv:label> </pcv:Descriptor> </dc:subject> <dc:subject> <pcv:Descriptor rdf:about="http://www.w3.org/TR/REC-rdf-syntax/"> <pcv:label>RDF Model and Syntax 1.0</pcv:label> </pcv:Descriptor> </dc:subject>
The danger of this approach is that the in-line labels could get out of sync with the values in the external controlled vocabulary document.
PRISM has been in development for a while, and has matured rather well. The PRISM working group has seen steady growth in its membership, and cites a growing body of success stories of PRISM in production. I have used PRISM in projects not directly related to publishing because of how it rounds out some of the basic Dublin Core properties. It is surprisingly useful in database projects, especially for integrating data sets from traditional databases into XML document systems.
- Download the latest PRISM spec and find more information at the PRISM home page.
- PRISM depends heavily on the Dublin Core Metadata Initiative. Become familiar with the Dublin Core element set.
- Learn all about the International Standard Serial Number at the ISSN Home page.
- Check out Thinking XML's previous columns.
- Discover many more XML resources on the developerWorks XML technology zone.
- Take a look at IBM WebSphere Studio Application Developer, an easy-to-use, integrated development environment for building, testing, and deploying J2EE applications, including generating XML documents from DTDs and schemas.
- Find out how you can become an IBM Certified Developer in XML and related technologies.
Uche Ogbuji is a consultant and co-founder of Fourthought Inc., a software vendor and consultancy specializing in XML solutions for enterprise knowledge management. Fourthought develops 4Suite, an open source platform for XML, RDF, and knowledge-management applications. Mr. Ogbuji is a Computer Engineer and writer born in Nigeria, living and working in Boulder, Colorado, USA. You can contact Mr. Ogbuji at firstname.lastname@example.org.