Skip to main content

If you don't have an IBM ID and password, register here.

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. This profile includes the first name, last name, and display name you identified when you registered with developerWorks. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

All information submitted is secure.

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

Thinking XML: Basic XML and RDF techniques for knowledge management, Part 4

Issue tracker schema

Uche Ogbuji (uche@ogbuji.net), Principal consultant, Fourthought, Inc.
Uche Ogbuji
Uche Ogbuji is a consultant and co-founder of Fourthought Inc., a software vendor and consultancy specializing in XML solutions for enterprise knowledge management. Fourthought develops 4Suite, an open-source platform for XML, RDF and knowledge-management applications. Mr. Ogbuji is a computer engineer and writer born in Nigeria, living and working in Boulder, Colorado, USA. You can contact Mr. Ogbuji at uche@ogbuji.net.

Summary:  Uche Ogbuji continues his exploration of how RDF combines with XML to enable knowledge management. In this installment, he takes an in-depth look at modeling in the RDF world, and begins to look at developing a schema for the issue tracker and how it is similar to and different from object-oriented and relational modeling. The reader will learn various tips, techniques, and best practices for developing effective knowledge management models from XML data.

Date:  01 Feb 2002
Level:  Intermediate

Comments:  

So far in our exploration of the issue tracker application, I have discussed, by example, the RDF data being extracted from the XML data, techniques for making this extraction, and a neat semantic searching capability that all this fuss with RDF makes possible. Now I'll take a closer look at the role schemata play in using RDF for building knowledge management features into XML applications.

Relational and object database schemata, and even XML schemata, provide documentation, guidance, and control for data-driven applications. RDF schemata are looser and more generic; they set forth our classification of the resources that we put into the RDF models. In this installment and the next, we'll look at schemata for the issue tracker RDF statements, using both the W3C RDF schema (RDFS) specification and the DARPA Agent Markup Language/Ontology Inference Language (DAML+OIL), which is an important extension of, and improvement on, the W3C specification. Some familiarity with RDFS and DAML+OIL is useful, though I will be introducing most of the concepts I use in my examples and discussion.

That's just typical of your class

Both RDFS and DAML+OIL revolve around the classification of resources. In previous installments of this column, you may have noticed that the issue tracker RDF has been rather light on classifications. In fact, it has used no classes and types at all so far. This is just fine for an RDF system. In the case of the issue tracker, since pretty much any resource can be marked up with issues -- and an issue can be pretty much anything to which we can attach authors, comments, and actions -- strict classifications are probably artificial and would just get in the way.

However, one of the strengths of RDF is that it does not require strict classifications of the sort required by many object-oriented (OO) languages. Its concept of class and type is far more general, and is open to interpretation by the model designers. A class can be the core of any sort of organization you might want to use for resources. It doesn't have to be a neat tree, such as the scientific classifications of living things. For instance, in the XML world "purchase order" is often used as an example of a document that is next to impossible to standardize even with the weight of XML behind any effort. This is because of the myriad ways that POs are classified, subclassified, and generally conceived. RDF is specifically designed to accommodate this sort of chaos.

RDFS introduces some of the worldview of OO development by putting forth the idea of a class as the natural indicator of type. Indeed a lot of RDF implementations follow this example, perhaps because OO techniques have enjoyed so much prominence recently. But I think it's extremely important to note that this pattern is not fundamental to RDF itself.

These are rather deep concepts, so a tangible example is in order here. Take the idea of a telephone number. There are many ways we can look at a telephone number, if we want to fix it in a classification scheme:

  • A telephone number is a kind of number.
  • A telephone number is a kind of contact datum.
  • A telephone number is a kind of asset (ask any U.S. business that has struggled to reserve a toll-free number that spells out their trademark on the numeric keypad).
  • A toll-free number is a kind of telephone number.
  • A fax number is a kind of telephone number.

You can see some of the classic hierarchy that is a hallmark of OO thinking. You can also see some of the overlapping and tentativeness of classifications that tend to cause problems in established OO practice. Just ask any OO developer about "diamonds of death" or "non-flying birds" if you want to trigger a blinking fit. In the above, the "kind of" is often mapped to the OO concept of an "is-a" relationship, and usually defines the type of the object as a consequence of the built-in semantics of OO implementation languages.

But in the real world, there is more to type than class. Take the following statements:

  • 501-555-1111 is Mark's work number.
  • 500-555-1234 is Mark's home number.
  • Use 500-555-1234 as Mark's emergency contact number.
  • You must use 10-digit dialing to dial 555-1234 from outside the 555 exchange.

These statements all define characteristics of a phone number. They are less clearly classifications than the first set of examples, and indeed in the OO worldview, they may be indicated in many ways, such as attributes and associations, but rarely using typing. However, considering the ways people generally think about such characteristics, there is no reason to think that they aren't types just as much as the first set of statements. It is natural to say that "a work number is a type of telephone number," and for locations within the 501 area code, a "10-digit number" is naturally a "type" of telephone number. In RDF there is no reason why these characteristics shouldn't be expressed using rdf:type predicates. In fact, consider the vCard/RDF proposal, a W3C note that suggests a conversion from the very popular vCard contact specification scheme to RDF. vCard/RDF uses rdf:type to differentiate work numbers from home numbers, fax numbers from voice numbers, Internet mailboxes from Lotus Notes mailboxes, etc. It also uses rdf:type in the common RDFS sense as well: for indicating classifications within its data model.

But if the same predicate (rdf:type) can be used in such a divergence of ways, hasn't it become dangerously vague? I think the situation calls for refinement of the various uses of rdf:type, and it would be best if RDFS were to introduce a subproperty of rdf:type, say rdfs:type, or if that is too confusing, rdfs:classificationType. Similarly, vCard could create a subproperty of rdf:type, say vCard:contactType, to differentiate the various concepts of type that it employs.


Scheming over the issues

The issue tracker doesn't need to do a lot of neat things with typing and classifications, but the above discussion encourages the idea that there is no reason why types, classes, and other schematic matters shouldn't be constructed quite loosely. For most RDF projects in which I've worked, it was established that you sit around the table with loads of doughnuts and caffeinated beverages in order to hammer out the schema. This is a puritan conscientiousness borrowed from the OO development and relational DBMS worlds. But in working with the issue tracker thus far, I've worked with a few instances before even coming around to the schema. There is no reason not to have done so. We are attaching issues to any Web-based resource, and we are making very loose statements about these issues.

It's time to talk schema. Listing 1 is an XML fragment that illustrates an RDFS class for an issue:


Listing 1. The Issue class
  <rdfs:Class ID="Issue">
    <rdfs:label>Issue</rdfs:label>
    <rdfs:comment>A problem, suggestion or other matter for action 
    or discussion relevant to a resource</rdfs:comment>
  </rdfs:Class>

This code declares an in-line (because of the use of ID) RDFS class for an issue. Note the label and comment -- I think these are extremely important, and in my practice I require both on every resource defined, especially on schematic elements. Labels are especially important because smart RDF tools can use them to present user-friendly names for resources rather than ugly URIs.


Listing 2. The Author class and issue and author properties
  <rdfs:Class ID="Author">
    <rdfs:label>Author</rdfs:label>
    <rdfs:comment>A person raising or posting an issue</rdfs:comment>
  </rdfs:Class>

  <rdfs:Property ID="issue">
    <rdfs:label>issue</rdfs:label>
    <rdfs:comment>Associate an issue with its resource
    </rdfs:comment>
    <rdfs:range rdf:resource="#Issue"/>
  </rdfs:Property>

  <rdfs:Property ID="author">
    <rdfs:label>author</rdfs:label>
    <rdfs:comment>Associate an issue with whoever posted it
    </rdfs:comment>
    <!-- Where the <i>dc</i> entity has been set to the
         Dublin Core metadata element base URI -->
    <rdfs:subPropertyOf rdf:resource="&dc;creator"/>
    <rdfs:domain rdf:resource="#Issue"/>
    <rdfs:range rdf:resource="#Author"/>
  </rdfs:Property>

Here we define a property issue. The range statement asserts that the object of any statement with an issue predicate must have an rdf:type of Issue. We don't make any such restriction on the subject of such statements (which would be a domain statement), so in effect any resource can have an issue predicate, which is our intent. The author property is defined with both a domain and a range, and is made to be a subproperty of the "creator" metadata element from Dublin Core. This means that any issue with an author property automatically asserts a dc:creator property as well. This is a common and useful technique and in this case means that agent software that is familiar with Dublin Core will be able to deal with our issue tracker metadata to some extent, without any problems. This trick is actually part of the foundation of the semantic Web.

If you've gone back to the instance data at this point in order to compare it to the schema we're building, you might be scratching your head: "But this doesn't match the instances with which we've been working." For example, we have:


Listing 3. Snippet from earlier instances
  <rdf:Description about='&ril-spec;ril-20010502'>
    <rit:issue rdf:resource='#i2001030423'/>
  </rdf:Description>

  <rdf:Description ID='i2001030423'>
    <it:author rdf:resource='&ril-users;#uogbuji'/>
   </rdf:Description>

This appears to violate the constraints we have set because the resource with ID i2001030423 is not declared to have rdf:type of Issue, nor is the resource with ID "uogbuji" declared to have rdf:type of Author.

Whether this is indeed a violation of the schema might actually depend on how we interpret the schema. The most common interpretation is that if there are no statements in the model to fulfill the terms of a constraint (such as domain or range), then the model is inconsistent -- usually an error condition. This is known as a restrictive role for RDF schemata. It is also part of what is sometimes known as a "closed world" assumption, since it does not consider anything that is not manifest in the model at the time of inquiry.

But there is another, less common, but very interesting approach. One of the constraints we defined in this installment says that if a resource has an author property, then it must be of rdf:typeIssue. One could then infer from the presence of said property on the i2001030423 resource that it must be of the required type. In short, the processor could effectively generate statements that allow constraints to be satisfied. This is known as a generative or inferential role for RDF schemata. It is closer to how people deal with the vicissitudes of the real world, and thus closer to the powerful ideas behind the semantic Web. But with this power come thorny pitfalls of knowledge representation.

The most important lesson to be learned here is that all is well even though we started with prototype RDF instances, and then worked up a schema that at first glance seemed to invalidate our earlier efforts. All is well, thanks to the generosity (I don't use the term lightly) of RDF. As an experienced modeler/designer, I must say that this power and flexibility is one of the bedrock strengths of RDF, as well as one of the reasons it can be so hard for traditional OO and relational thinkers to grasp.


Conclusion

We have run out of space for this installment, but I hope you've found it valuable that I've taken time to introduce and discuss important modeling concepts as we've proceeded. I would have been grateful for such a walk through when I was first trying to get my head around extensible metadata. In the next installment, we'll round off the issue tracker schema in RDFS form and look at it in DAML form as well.


Resources

About the author

Uche Ogbuji

Uche Ogbuji is a consultant and co-founder of Fourthought Inc., a software vendor and consultancy specializing in XML solutions for enterprise knowledge management. Fourthought develops 4Suite, an open-source platform for XML, RDF and knowledge-management applications. Mr. Ogbuji is a computer engineer and writer born in Nigeria, living and working in Boulder, Colorado, USA. You can contact Mr. Ogbuji at uche@ogbuji.net.

Report abuse help

Report abuse

Thank you. This entry has been flagged for moderator attention.


Report abuse help

Report abuse

Report abuse submission failed. Please try again later.


developerWorks: Sign in

If you don't have an IBM ID and password, register here.


Forgot your IBM ID?


Forgot your password?
Change your password


By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. This profile includes the first name, last name, and display name you identified when you registered with developerWorks. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

Choose your display name

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

(Must be between 3 – 31 characters.)


By clicking Submit, you agree to the developerWorks terms of use.

 


Rate this article

Comments

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=XML
ArticleID=86730
ArticleTitle=Thinking XML: Basic XML and RDF techniques for knowledge management, Part 4
publish-date=02012002
author1-email=uche@ogbuji.net
author1-email-cc=

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

For articles in technology zones (such as Java technology, Linux, Open source, XML), Popular tags shows the top tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), Popular tags shows the top tags for just that product zone.

For articles in technology zones (such as Java technology, Linux, Open source, XML), My tags shows your tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), My tags shows your tags for just that product zone.

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).