In this series, I have demonstrated general techniques for interoperability between XML and RDF facilities in applications, focusing on how RDF interchange can add features to XML applications for semantics or just greater flexibility in information modeling. In wrapping up the series, I will review the various techniques I have introduced, and give examples and guidelines of applying them to other application development tasks.
The original RDF 1.0 specification bundled both the abstract model of RDF and an XML syntax for interoperable representation and interchange. I, among others, have regarded this combination as a mistake. In addition to the specification design issues, this bundling led many people to believe that RDF is strictly the syntax in the specification, and that to take advantage of RDF you have to, for example, make up XML files in that syntax using
rdf:RDF as the root element in all documents. This is completely wrong, and an important enough point for me to re-state:
In order to use and gain advantage from RDF, you do not have to use any particular syntax -- not even the syntax specified in the RDF 1.0 specification.
The trick is to extract key metadata from XML documents -- or even non-markup formats such as RDBMS -- and synchronize it into RDF models. You can then treat these as a localized semantic Web. For instance, you might tie all the following together:
- Personal contact information in one data source
- To-do items and calendar entries from another source
- Public key infrastructure metadata to establish trust
- Dictionary listings and other semantic data
This rich integration of information enables otherwise impractical features and creates added value that offsets the work of the traffic between XML and RDF models.
The XSLT transforms I have demonstrated for converting XML to RDF syntax (for the sole purpose of interoperable import into RDF models) are one way to achieve XML/RDF interchange. Some RDF tools can manage such interchange for you. For instance, with the 4Suite tools, you can define mapping rules without dealing with the RDF syntax.
In my sample issue tracker application, I use the XML/RDF mappings to produce and exchange application data in the most natural formats, while using RDF to tie in such advanced features as semantic searches with very little effort. In the other thread of this column (see Resources), I introduced quite a few e-business dictionary resources; in applications that take advantage of these resources, you can use the XML/RDF mappings to import the resources as the following:
- RosettaNet dictionaries
- The ISO Basic Semantic Registry (which is already available in RDF form)
- The U.S. Government schematic repositories in DISA
In a unified RDF model, these resources can be richly linked with a variety of local or other global data sets for unified query and autonomous agent processing.
In his keynote address at the recent XML Web Services One conference, William Ruh of Software AG pointed out that a huge amount of effort these days goes into creating ontologies in XML formats. Ontologies are documents that define terms and concepts in a structured format and thus provide semantic databases. Mr. Ruh pointed out that the integration of this raw XML material with RDF technologies will lead to a practical semantic Web.
I have also demonstrated a very low-level approach to querying RDF models. RDF models, being general graphs, need specialized query facilities that can traverse graphs and arcs singly or in aggregate. This is different from the hashing and search tree techniques of older hierarchical databases, the table scanning of relational databases, and the attribute aggregation and relationship traversal of object databases. Actually, RDF querying most closely resembles the latter: Object relationships are structured as abstract graphs, but the difference is that object theory already supplies a lot of the semantics that underlie its arcs and nodes. RDF supplies none of these semantics (RDF Schema and DAML+OIL are examples of systems layered upon RDF that do provide some structure and semantics, but even these are far more generic than object theory).
I also introduced a higher-level query language for RDF: Versa. This language deals in graph traversal and property aggregation with features for integration into other languages, such as XML processing languages. Other higher-level RDF query languages, such as Squish and CWM (see Resources), focus on SQL-like idioms and logical inference. Some XQuery advocates have suggested that XQuery could be used for metadata query either from XML/RDF serializations or, if XML/RDF mappings are used, from the original XML. The main problem with this idea is that most general purpose XQuery implementations are unlikely to be optimized for the sorts of query patterns needed in RDF. If you wish to use a specialized query engine, you might as well use syntax and semantics appropriate to the specialization (for example, RDF).
Another matter to consider when choosing RDF query mechanisms is how distributed the data is. Much of the discussion in this series so far is appropriate to closed models and monolithic databases, but some of the power of RDF is most apparent in distributed systems. For example, RDF can be a very useful tool for aggregating metadata across intranet pages, or for inexpensive application integration. In such cases, you need a query system that can store and forward partial results, compose sub-queries efficiently, and deal with conflicts and contradictions. These facilities are already common to many agent technologies, such as search engine Web crawlers, and can be overlaid onto basic RDF query systems.
Perhaps even more important than the XML mappings or query implementation details I have covered are the information modeling lessons. All common methodologies for application development, including entity-relational modeling and the Unified Modeling Process, look forward to implementation rather than backward to the fundamental concepts that make up the problem space approximated by the implementation. In many ways, this is a large contributor to the notorious expense of maintenance and integration of software. Two different software packages -- say, a human resources database and a content management package -- may have the same concept of employee, but how they actually model and implement the structures and processes of employee data are almost inevitably different in the two packages. The constraints and rules differ. The representations and included data differ. If you need to bridge these two applications, these differences can snarl the effort, and in many cases make it economically infeasible.
The Object Management Group (OMG), the caretaker organization of many Object design specifications including CORBA and UML, has indicated some understanding of this problem. OMG is making a fundamental shift from founding development practice in systems based on platform neutrality (CORBA and UML) to founding development on completely abstract models that represent concepts within organizations, within industries, or even globally. Such models may sound remarkably like ontologies, yet for some reason (probably a desire to consolidate the appeal of its work products), the OMG has chosen to promote UML as the language and representation for the abstract model. The problem here is that UML is biased towards implementation. As the OMG itself says in its introduction to UML:
You can model just about any type of application, running on any type and combination of hardware, operating system, programming language, and network, in UML. Its flexibility lets you model distributed applications that use just about any middleware on the market. Built upon the MOF [Meta-Object Facility] metamodel which defines class and operation as fundamental concepts, it's a natural fit for object-oriented languages and environments such as C++, Java [language], and the recent C#, but you can use it to model non-OO applications as well in, for example, Fortran, VB, or COBOL.
At the end, this OMG description disclaims exclusivity to object-oriented design, which is a debatable disclaimer in itself. But even if one grants that point, the very advertisement of UML as geared towards application development underscores the remaining problem. One of the triumphs of XML is its ability to expand of the world view of developers beyond the narrow confines of application development. Originally, this came in the form of integrating the concerns and patterns of traditional document-oriented processing inherited from SGML. More recently -- and especially with the intrusion of XML into the world of distributed programming in the form of Web services -- more developers are reaching a fundamental understanding of how stateless information processing models work.
To determine how to get the best of both worlds to provide more maintainable bases for application development, a rapport between OMG folks and RDF/ontology folks is growing. Modeling in RDF opens your eyes to the broader relevance of such institutionalized concepts as class and type. This is an important stepping stone on the path to next-generation modeling. Right now, integrating ontologies with traditional analysis and design already pays significant dividends to early adopters in the form of shorter development cycles, improved maintenance and integration, and emergent value in the knowledge bases that drive the resulting applications. Unfortunately, such pioneers also suffer many privations: Programming tools do not yet support ontologies, and ontology tools (such as they are) do not yet support traditional programming. Very sophisticated developers and a highly entrepreneurial atmosphere are required to succeed with such hybrid methodologies. Both of these assets, unfortunately, are in very short supply in most IT organizations.
The important point is that if projects put information modeling first, they have a much better chance of keeping the application in step with the real world. Putting information modeling first means founding the project with the formal descriptions and expected behavior of the key concepts to be modeled, and using generic data representation tools such as XML and RDF schemata very early in the cycle. And only when you define the abstract model well is it used to seed the resulting artifacts that frame the programmatic aspects of the application design.
A lot of work is happening behind the scenes in the software modeling and ontology communities. It seems inevitable that this interchange will lead to fundamental changes in the way we develop applications. The need now is for a bridge between the two worlds in the form of extensions to existing, well-established tools. I have started experimenting with such tools in the form that I call the Clean Modeling Methodology. The Clean Modeling Methodology starts with a model that is cleansed of all implementation considerations and then assists the developer in generating interfaces and other design artifacts. Many others have worked on tools for bridging UML and RDF technologies, and XMI is an established, though unwieldy tool for bridging UML and XML.
In the next article, which wraps up the "Basic XML and RDF techniques for knowledge management" series, I will present a unified implementation of the issue tracker application, including updates made to querying methods and schemata.
- To learn about the value of RDF from a database perspective, read my article, "An introduction to RDF" (developerWorks, December 2000).
- Discover some applications of XML and RDF in managing contact information in Edd Dumbill's recent column "XML
Watch: Finding friends with XML and RDF" (developerWorks, June 2002).
- Find out how an ideogramic suite demonstrates UML-oriented XML processing in Cameron Laird's article XMI and UML combine to drive product development (developerWorks, October 2001).
- Check out Thinking XML's previous columns.
- Take a look at IBM WebSphere Studio Application Developer, an easy-to-use, integrated development environment for building, testing, and deploying J2EE applications, including generating XML documents from DTDs and schemas.
- Find out how you can become an IBM Certified Developer in XML and related technologies.
- Get the buzz around RDF at the recent XML Web Services One conference, including a
synopsis of William Ruh's keynote address, from "Growing RDF
mindshare reflected in XML Web Services One keynote."
- Learn more about the value of RDF for Web content in "The Languages of the Semantic Web."
- Want a good introduction to the interchange between XML and UML? Read "Modeling XML Applications with UML: Practical e-Business Applications" by David Carlson.
- Learn about CWM and Squish, alternative RDF query languages with different aims and models than Versa.
Uche Ogbuji is a consultant and co-founder of Fourthought Inc., a software vendor and consultancy specializing in XML solutions for enterprise knowledge management. Fourthought develops 4Suite, an open source platform for XML, RDF, and knowledge-management applications. Mr. Ogbuji is a computer engineer and writer born in Nigeria, living and working in Boulder, Colorado, USA. You can contact him at firstname.lastname@example.org.