In my Thinking XML column, I frequently focus on how various industries are working toward semantic transparency, which is the shared meaning of at least the framework of what is communicated in an XML document. Either the industries do so by creating complete document formats along with the semantics of all the elements, attributes, and content, or they define terms and concepts discretely and individually, independently of the documents in which they would appear. I call these approaches top-down and bottom-up, respectively, and very active communities provide useful material on each.
If you develop XML schemas for your own use or for public use, the usual sage advice is to be sure that you aren't carelessly duplicating existing work. But even if you're truly in new territory, or have good reason not to simply reuse existing languages, try as much as possible to lean on existing initiatives towards semantic transparency. This is best whether you are developing an XML format for private usage, or as a shared or public resource. You can borrow naming conventions and perhaps even schema snippets from existing vocabularies, but a less common technique for building on the work of others is to incorporate what I call semantic links into your own schemata -- special links to existing standards that define the syntactic constructs you define in your schema. This provides for a particularly rich form of data dictionaries for XML schemata. In this tip, I show how to work such links into your schemata.
Imagine that you are a developer working on information systems to manage your organization's Web services. Your first task is to put together some information for detailing the proposed Web services in the context of the budget requests that you must make to be sure the products get off the ground. You'll use XML so that you can easily generate reports and views on the information, and so that you can mix together information from several domains effortlessly. Listing 1 is a snippet from a RELAX NG schema that you might construct for this purpose.
Listing 1. Portion of RELAX NG schema for budget information for Web services development
<element name='service'> <owl:sameClassAs resource="http://www.daml.org/services/owl-s/1.0/Service.owl#Service"/> <attribute name='id'/> <element name='synopsis'> <owl:samePropertyAs resource='http://www.w3.org/2000/01/rdf-schema#comment'/> </element> <element name='budget-request'> <attribute name='currency'/> </element> <element name='justification'/> </element>
This snippet does not include the namespace declarations, but the
owl prefix is bound to the namespace for OWL Web Ontology Language,
http://www.w3.org/2002/07/owl#. OWL is the W3C standard for ontologies, which are documents that provide enough information to share the meanings of a group of concepts. As such, OWL is an excellent way to express semantic links in schemata. Each OWL element is an annotation that expresses a link from the containing RELAX NG definition.
The OWL expression
owl:sameClassAs is used to declare that an information item in the XML schema represents some class of thing. In the example, it's important to be clear that what you mean by the element type named
service is a Web service of the SOAP or REST or similar variety, so you anchor it to the concept definition from OWL-S, which is a standard ontology of Web services and the service-oriented architecture (SOA). An element sometimes has more of the feel of an attribute or property of another element rather than a class in its own right. Here, the
owl:samePropertyAs expression is used to identify the
synopsis element as equivalent to a comment in RDF schema (RDFS), which is prosaically defined as "used to provide a human-readable description of a resource." In this case you're formally asserting that the synopsis is a human-readable description of the service.
RELAX NG makes it easy to add such annotations because any element in a foreign namespace can be placed anywhere in a definition. If you're using W3C XML Schema (WXS), things are more complicated: You have to place such foreign elements within
xsd:appinfo, which must in turn be placed in
Because you've taken so much care in organizing and presenting your Web services budget requests, you've won approval, and now it's time for implementation. One of the newly-funded services is a calendar and appointment service, and you now need to write the WSDL for it. Of course, you want to be sure that the schema snippets in the WSDL contain semantic links, but you also want to try to sprinkle semantic links into other parts of the description as well. Listing 2 is such a snippet of WSDL, conforming to WSDL 1.1.
Listing 2. Portion of WSDL that includes a semantic link for a message part
<wsdl:message name="get-upcoming-appointments"> <wsdl:part name="requested-duration" element="schema:duration"> <wsdl:documentation> <owl:sameClassAs resource="http://www.w3.org/2002/12/cal/ical#duration"/> </wsdl:documentation> </wsdl:part> </wsdl:message>
In defining a message that requests the upcoming appointments from now through a given duration, you establish precision about what you mean by
duration when defining that parameter in the request. You do this by referencing the W3C's suggested expression of the iCalendar standard (RFC 2445) as a formal ontology. You place this link in a
wsdl:documentation element, which is not ideal since this element is usually reserved for human-readable documentation. You want to do this because of constraints in the WSDL specification, which like WXS restricts the places where foreign elements (called "extensibility element") can be placed. The model for extensibility is still being developed for WSDL 2.0. I hope the working group decides to loosen unnecessary restrictions before they're done (never mind for now the fact that it seems the idea of message part is being overhauled for 2.0).
By adding semantic links, you have taken simple terms used for XML and Web services schemata (service, synopsis, duration), and placed them in a specific context. This makes it much easier to automatically infer their meaning when deployed within systems, and increases the information value of the corresponding XML documents. As an example, using some such semantic link, you can easily determine that the
synopsis element has the same meaning as an element named
description in another vocabulary. Professional DBAs emphasize the importance of data dictionaries to frame the terms they use in their schemata. XML developers should be no less diligent.
On a more general note, the value of supporting semantic links is also good reason to be sure that materials in schemata, dictionaries, ontologies, and such can be referenced using simple URLs. If you are working in an initiative for semantic transparency, please be sure that one of your goals is easy access through simple linking.
- Learn about OWL Web Ontology Language at the Web Ontology Working Group home page. OWL is founded upon RDF Schema, which can be used as a general vocabulary for descriptions of resources.
- If you use Web services, check out OWL-S, which is a very rigorous ontology of Web services and service-oriented architecture (SOA).
- Check out RDF Calendar, the W3C's extensive effort to create an ontology and framework for schedule and calendar information based on iCalendar (RFC 2445).
- See Uche Ogbuji's
columns, which discuss issues of semantic transparency and knowledge management using XML. In particular, see "Semantic anchors for XML" (developerWorks, October 2003) in which he explores initiatives that provide suitable anchors for semantic links.
- Find a broad array of articles, columns, tutorials, and tips on these two popular technologies at the developerWorks
XML and SOA and Web services zones. While you're at it, subscribe to the
developerWorks Web services/XML Tips newsletter.
- For a complete list of XML tips to date, check out the tips summary page.
Browse for books on these and other technical topics.
- Learn how you can become an IBM Certified Developer in XML and related technologies.
Uche Ogbuji is a consultant and co-founder of Fourthought Inc., a software vendor and consultancy specializing in XML solutions for enterprise knowledge management. Fourthought develops 4Suite, an open source platform for XML, RDF, and knowledge-management applications. Mr. Ogbuji is also a lead developer of the Versa RDF query language. He is a computer engineer and writer born in Nigeria, living and working in Boulder, Colorado, USA. You can contact Mr. Ogbuji at email@example.com.