Skip to main content

If you don't have an IBM ID and password, register here.

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. This profile includes the first name, last name, and display name you identified when you registered with developerWorks. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

All information submitted is secure.

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

Thinking XML: Basic XML and RDF techniques for knowledge management, Part 7

Review and relevance of the techniques discussed

Uche Ogbuji (uche@ogbuji.net), Principal Consultant, Fourthought, Inc.
Photo of Uche Ogbuji
Uche Ogbuji is a consultant and co-founder of Fourthought Inc., a software vendor and consultancy specializing in XML solutions for enterprise knowledge management. Fourthought develops 4Suite, an open source platform for XML, RDF, and knowledge-management applications. Mr. Ogbuji is a computer engineer and writer born in Nigeria, living and working in Boulder, Colorado, USA. You can contact him at uche@ogbuji.net.

Summary:  Uche Ogbuji takes a moment to review in a broader context the relevance of the XML/RDF techniques he has been presenting. He discusses the importance of XML/RDF interchange, of specialized RDF query, and of applying lessons from RDF modeling to overall application development. He also shows how this thread of the Thinking XML column relates to the parallel thread on developments toward semantic transparency.

Date:  01 Jul 2002
Level:  Intermediate

Comments:  

In this series, I have demonstrated general techniques for interoperability between XML and RDF facilities in applications, focusing on how RDF interchange can add features to XML applications for semantics or just greater flexibility in information modeling. In wrapping up the series, I will review the various techniques I have introduced, and give examples and guidelines of applying them to other application development tasks.

Repeat after me: "There is no syntax"

The original RDF 1.0 specification bundled both the abstract model of RDF and an XML syntax for interoperable representation and interchange. I, among others, have regarded this combination as a mistake. In addition to the specification design issues, this bundling led many people to believe that RDF is strictly the syntax in the specification, and that to take advantage of RDF you have to, for example, make up XML files in that syntax using rdf:RDF as the root element in all documents. This is completely wrong, and an important enough point for me to re-state:

In order to use and gain advantage from RDF, you do not have to use any particular syntax -- not even the syntax specified in the RDF 1.0 specification.

The trick is to extract key metadata from XML documents -- or even non-markup formats such as RDBMS -- and synchronize it into RDF models. You can then treat these as a localized semantic Web. For instance, you might tie all the following together:

  • Personal contact information in one data source
  • To-do items and calendar entries from another source
  • Public key infrastructure metadata to establish trust
  • Dictionary listings and other semantic data

This rich integration of information enables otherwise impractical features and creates added value that offsets the work of the traffic between XML and RDF models.

The XSLT transforms I have demonstrated for converting XML to RDF syntax (for the sole purpose of interoperable import into RDF models) are one way to achieve XML/RDF interchange. Some RDF tools can manage such interchange for you. For instance, with the 4Suite tools, you can define mapping rules without dealing with the RDF syntax.

In my sample issue tracker application, I use the XML/RDF mappings to produce and exchange application data in the most natural formats, while using RDF to tie in such advanced features as semantic searches with very little effort. In the other thread of this column (see Resources), I introduced quite a few e-business dictionary resources; in applications that take advantage of these resources, you can use the XML/RDF mappings to import the resources as the following:

  • RosettaNet dictionaries
  • The ISO Basic Semantic Registry (which is already available in RDF form)
  • The U.S. Government schematic repositories in DISA

In a unified RDF model, these resources can be richly linked with a variety of local or other global data sets for unified query and autonomous agent processing.

In his keynote address at the recent XML Web Services One conference, William Ruh of Software AG pointed out that a huge amount of effort these days goes into creating ontologies in XML formats. Ontologies are documents that define terms and concepts in a structured format and thus provide semantic databases. Mr. Ruh pointed out that the integration of this raw XML material with RDF technologies will lead to a practical semantic Web.


The art of query

I have also demonstrated a very low-level approach to querying RDF models. RDF models, being general graphs, need specialized query facilities that can traverse graphs and arcs singly or in aggregate. This is different from the hashing and search tree techniques of older hierarchical databases, the table scanning of relational databases, and the attribute aggregation and relationship traversal of object databases. Actually, RDF querying most closely resembles the latter: Object relationships are structured as abstract graphs, but the difference is that object theory already supplies a lot of the semantics that underlie its arcs and nodes. RDF supplies none of these semantics (RDF Schema and DAML+OIL are examples of systems layered upon RDF that do provide some structure and semantics, but even these are far more generic than object theory).

I also introduced a higher-level query language for RDF: Versa. This language deals in graph traversal and property aggregation with features for integration into other languages, such as XML processing languages. Other higher-level RDF query languages, such as Squish and CWM (see Resources), focus on SQL-like idioms and logical inference. Some XQuery advocates have suggested that XQuery could be used for metadata query either from XML/RDF serializations or, if XML/RDF mappings are used, from the original XML. The main problem with this idea is that most general purpose XQuery implementations are unlikely to be optimized for the sorts of query patterns needed in RDF. If you wish to use a specialized query engine, you might as well use syntax and semantics appropriate to the specialization (for example, RDF).

Another matter to consider when choosing RDF query mechanisms is how distributed the data is. Much of the discussion in this series so far is appropriate to closed models and monolithic databases, but some of the power of RDF is most apparent in distributed systems. For example, RDF can be a very useful tool for aggregating metadata across intranet pages, or for inexpensive application integration. In such cases, you need a query system that can store and forward partial results, compose sub-queries efficiently, and deal with conflicts and contradictions. These facilities are already common to many agent technologies, such as search engine Web crawlers, and can be overlaid onto basic RDF query systems.


Next generation analysis and design

Perhaps even more important than the XML mappings or query implementation details I have covered are the information modeling lessons. All common methodologies for application development, including entity-relational modeling and the Unified Modeling Process, look forward to implementation rather than backward to the fundamental concepts that make up the problem space approximated by the implementation. In many ways, this is a large contributor to the notorious expense of maintenance and integration of software. Two different software packages -- say, a human resources database and a content management package -- may have the same concept of employee, but how they actually model and implement the structures and processes of employee data are almost inevitably different in the two packages. The constraints and rules differ. The representations and included data differ. If you need to bridge these two applications, these differences can snarl the effort, and in many cases make it economically infeasible.

The Object Management Group (OMG), the caretaker organization of many Object design specifications including CORBA and UML, has indicated some understanding of this problem. OMG is making a fundamental shift from founding development practice in systems based on platform neutrality (CORBA and UML) to founding development on completely abstract models that represent concepts within organizations, within industries, or even globally. Such models may sound remarkably like ontologies, yet for some reason (probably a desire to consolidate the appeal of its work products), the OMG has chosen to promote UML as the language and representation for the abstract model. The problem here is that UML is biased towards implementation. As the OMG itself says in its introduction to UML:

You can model just about any type of application, running on any type and combination of hardware, operating system, programming language, and network, in UML. Its flexibility lets you model distributed applications that use just about any middleware on the market. Built upon the MOF [Meta-Object Facility] metamodel which defines class and operation as fundamental concepts, it's a natural fit for object-oriented languages and environments such as C++, Java [language], and the recent C#, but you can use it to model non-OO applications as well in, for example, Fortran, VB, or COBOL.

At the end, this OMG description disclaims exclusivity to object-oriented design, which is a debatable disclaimer in itself. But even if one grants that point, the very advertisement of UML as geared towards application development underscores the remaining problem. One of the triumphs of XML is its ability to expand of the world view of developers beyond the narrow confines of application development. Originally, this came in the form of integrating the concerns and patterns of traditional document-oriented processing inherited from SGML. More recently -- and especially with the intrusion of XML into the world of distributed programming in the form of Web services -- more developers are reaching a fundamental understanding of how stateless information processing models work.

To determine how to get the best of both worlds to provide more maintainable bases for application development, a rapport between OMG folks and RDF/ontology folks is growing. Modeling in RDF opens your eyes to the broader relevance of such institutionalized concepts as class and type. This is an important stepping stone on the path to next-generation modeling. Right now, integrating ontologies with traditional analysis and design already pays significant dividends to early adopters in the form of shorter development cycles, improved maintenance and integration, and emergent value in the knowledge bases that drive the resulting applications. Unfortunately, such pioneers also suffer many privations: Programming tools do not yet support ontologies, and ontology tools (such as they are) do not yet support traditional programming. Very sophisticated developers and a highly entrepreneurial atmosphere are required to succeed with such hybrid methodologies. Both of these assets, unfortunately, are in very short supply in most IT organizations.

The important point is that if projects put information modeling first, they have a much better chance of keeping the application in step with the real world. Putting information modeling first means founding the project with the formal descriptions and expected behavior of the key concepts to be modeled, and using generic data representation tools such as XML and RDF schemata very early in the cycle. And only when you define the abstract model well is it used to seed the resulting artifacts that frame the programmatic aspects of the application design.


Next, the technical update

A lot of work is happening behind the scenes in the software modeling and ontology communities. It seems inevitable that this interchange will lead to fundamental changes in the way we develop applications. The need now is for a bridge between the two worlds in the form of extensions to existing, well-established tools. I have started experimenting with such tools in the form that I call the Clean Modeling Methodology. The Clean Modeling Methodology starts with a model that is cleansed of all implementation considerations and then assists the developer in generating interfaces and other design artifacts. Many others have worked on tools for bridging UML and RDF technologies, and XMI is an established, though unwieldy tool for bridging UML and XML.

In the next article, which wraps up the "Basic XML and RDF techniques for knowledge management" series, I will present a unified implementation of the issue tracker application, including updates made to querying methods and schemata.


Resources

About the author

Photo of Uche Ogbuji

Uche Ogbuji is a consultant and co-founder of Fourthought Inc., a software vendor and consultancy specializing in XML solutions for enterprise knowledge management. Fourthought develops 4Suite, an open source platform for XML, RDF, and knowledge-management applications. Mr. Ogbuji is a computer engineer and writer born in Nigeria, living and working in Boulder, Colorado, USA. You can contact him at uche@ogbuji.net.

Report abuse help

Report abuse

Thank you. This entry has been flagged for moderator attention.


Report abuse help

Report abuse

Report abuse submission failed. Please try again later.


developerWorks: Sign in

If you don't have an IBM ID and password, register here.


Forgot your IBM ID?


Forgot your password?
Change your password


By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. This profile includes the first name, last name, and display name you identified when you registered with developerWorks. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

Choose your display name

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

(Must be between 3 – 31 characters.)


By clicking Submit, you agree to the developerWorks terms of use.

 


Rate this article

Comments

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=XML
ArticleID=86624
ArticleTitle=Thinking XML: Basic XML and RDF techniques for knowledge management, Part 7
publish-date=07012002
author1-email=uche@ogbuji.net
author1-email-cc=

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

For articles in technology zones (such as Java technology, Linux, Open source, XML), Popular tags shows the top tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), Popular tags shows the top tags for just that product zone.

For articles in technology zones (such as Java technology, Linux, Open source, XML), My tags shows your tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), My tags shows your tags for just that product zone.

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).