Skip to main content

If you don't have an IBM ID and password, register here.

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. This profile includes the first name, last name, and display name you identified when you registered with developerWorks. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

All information submitted is secure.

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

Tip: Use internal references in XML vocabularies

Minimize repetition and resulting errors with ID types and in-line XPath queries

Uche Ogbuji (uche@ogbuji.net), Principal Consultant, Fourthought, Inc.
Photo of Uche Ogbuji
Uche Ogbuji is a consultant and co-founder of Fourthought Inc., a software vendor and consultancy specializing in XML solutions for enterprise knowledge management. Fourthought develops 4Suite, an open source platform for XML, RDF, and knowledge-management applications. Mr. Ogbuji is a computer engineer and writer born in Nigeria, living and working in Boulder, Colorado, USA. You can contact Mr. Ogbuji at uche@ogbuji.net.

Summary:  In some cases, you can avoid repeating identical data fields by using internal references from one field to another. Uche Ogbuji demonstrates how in this tip.

View more content in this series

Date:  01 Mar 2003
Level:  Intermediate

Comments:  

One of the biggest criticisms that traditional database experts level at XML is the fact that its hierarchical nature encourages the sort of repetitiveness that would be banished by relational normalization. This is certainly a valid complaint, and the key to XML's success is that its flexibility and convenience outweigh this failing. (Of course, database purists say that XML's advantages only appear to outweigh its problems to the less rigorous.) In this tip, I offer a couple of techniques that can help with this in certain cases. However, it is not a general solution to the problem of XML's hierarchical limitations.

Nixing cut and paste

Sometimes repetitive data occurs when data can be reused, but reuse is not required. A good example of this is the billing and shipping addresses of a business partner. Listing 1 is a sample customer record that includes such addresses.


Listing 1. A sample customer record
                
<customer>
  <name>Bards, Inc.</name>
  <billing-address>1000 Lay Way, Burgh, UK</billing-address>
  <shipping-address>1000 Lay Way, Burgh, UK</shipping-address>
  <phone>606-217-8899</phone>
  <email>bards@angles.co.uk</email>
</customer>

In this case, billing-address and shipping-address are the same string. Imagine that this file is the result of a form that was filled out. The data may be entered into two separate fields, even though it's the same value -- this is a well-known recipe for data inconsistency errors. For this reason, many such forms offer a check box so one can enter just the billing address and match the shipping address to the billing address automatically. One can do the same sort of thing in the XML data if the vocabulary allows it. XML 1.0 makes this possible through the use of ID types. Listing 2 offers an example of this.


Listing 2. A customer record format that uses ID types to avoid repetition
                
<!DOCTYPE customer [
  <!ELEMENT customer (name, billing-address, shipping-address,
                      phone, email
  )>
  <!ELEMENT billing-address (#PCDATA)>
  <!ATTLIST billing-address id ID #IMPLIED>
  <!ELEMENT shipping-address (#PCDATA)>
  <!ATTLIST shipping-address ref IDREF #IMPLIED>
  <!ELEMENT name (#PCDATA)>
  <!ELEMENT phone (#PCDATA)>
  <!ELEMENT email (#PCDATA)>
]>
<customer>
  <name>Bards, Inc.</name>
  <billing-address id="x">1000 Lay Way, Burgh, UK</billing-address>
  <shipping-address ref="x"/>
  <phone>606-217-8899</phone>
  <email>bards@angles.co.uk</email>
</customer>

In this case, the vocabulary is augmented by allowing the billing-address element to have an optional attribute id, which is defined as a unique ID type. The shipping-address element also gets an optional attribute ref, which is defined as a reference to a unique ID type. For the purposes of this example, I placed the DTD that's needed to assert these attribute types into the internal subset. The processing code then needs to know how to handle the special attributes and properly infer the value of the shipping address.

Another approach is to use XPath to reference the target value, as in Listing 3.


Listing 3. A customer record format that uses XPath to avoid repetition
                
<customer>
  <name>Bards, Inc.</name>
  <billing-address>1000 Lay Way, Burgh, UK</billing-address>
  <shipping-address>
    <xpath-ref select="../billing-address"/>
  </shipping-address>
  <phone>606-217-8899</phone>
  <email>bards@angles.co.uk</email>
</customer>

This time I have added a special element to the vocabulary, xpath-ref, which contains an XPath expression to be evaluated with its parent element as the context node. In this example, it selects the document's billing-address element node name, which is presumably then converted to a string. Again, a processor would have to implement this reference, but this XPath method offers more flexibility; for one thing, XPath functions and other expression facilities can be used to select more complex values.


Wrap up

You should use internal references like this with some care. With the ID method, be sure to maintain the validity of the document, and with the XPath method, watch out for situations where a modification causes the XPath to fail to select the expected result.

When designing XML vocabularies, try to minimize repetition wherever possible. You can do this many ways, and internal references can be a handy tool in that effort.


Resources

About the author

Photo of Uche Ogbuji

Uche Ogbuji is a consultant and co-founder of Fourthought Inc., a software vendor and consultancy specializing in XML solutions for enterprise knowledge management. Fourthought develops 4Suite, an open source platform for XML, RDF, and knowledge-management applications. Mr. Ogbuji is a computer engineer and writer born in Nigeria, living and working in Boulder, Colorado, USA. You can contact Mr. Ogbuji at uche@ogbuji.net.

Report abuse help

Report abuse

Thank you. This entry has been flagged for moderator attention.


Report abuse help

Report abuse

Report abuse submission failed. Please try again later.


developerWorks: Sign in

If you don't have an IBM ID and password, register here.


Forgot your IBM ID?


Forgot your password?
Change your password


By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. This profile includes the first name, last name, and display name you identified when you registered with developerWorks. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

Choose your display name

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

(Must be between 3 – 31 characters.)


By clicking Submit, you agree to the developerWorks terms of use.

 


Rate this article

Comments

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=XML
ArticleID=12240
ArticleTitle=Tip: Use internal references in XML vocabularies
publish-date=03012003
author1-email=uche@ogbuji.net
author1-email-cc=

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

For articles in technology zones (such as Java technology, Linux, Open source, XML), Popular tags shows the top tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), Popular tags shows the top tags for just that product zone.

For articles in technology zones (such as Java technology, Linux, Open source, XML), My tags shows your tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), My tags shows your tags for just that product zone.

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).