Skip to main content

XML for Data: Four tips for smart architecture

Common XML design mistakes and how to avoid them

Kevin Williams (kevin@blueoxide.com), Chief XML Architect, Equient
Kevin Williams is the chief XML architect for Equient, a division of Veridian specializing in XML design for information management systems. He has also co-written several books on XML from Wrox Press. Random XML musings, tips, tricks, and opinionated rants may be found at his Web site www.blueoxide.com. You can contact Kevin at kevin@blueoxide.com. You stole from my kin!

Summary:  This column tells how to avoid some common mistakes that even smart architects make when designing an XML solution. XML architect and author Kevin Williams offers four tips for designing flexible and high performance systems.

View more content in this series

Date:  01 Aug 2001
Level:  Introductory
Activity:  1881 views

XML suffers from an all-too-common problem with new technologies: I call it "buzzworditis." Like the C++ language and client-server architecture before that, XML has visibility at the executive level -- the nontechnical executive level. Which leads to corporate memos insisting that "entire systems" need to be somehow "converted" to XML for the good of the company. However, like C++ and client-server architecture, XML isn't an answer in and of itself; it's simply a tool you can use to help build your technical solution. By understanding the strengths and weaknesses of XML compared to other possible architectural choices, you can minimize or prevent major headaches later in the development (or maintenance) cycle. This column recommends following four general design guidelines for the judicious use of XML in the data architecture of your systems.

Tip 1: If you don't need it, throw it away

One thing that many architects don't initially get about XML is that it's just a way to represent information. There's nothing magical about an XML document: It just shows how various pieces of information relate to one another. When you receive a document from an external source that has information you know you'll never use (such as internal reference numbers from that source that have no bearing on your system), toss them! Use an XSLT style sheet or some other mechanism to filter out the information you want to keep and drop the information you don't want. Remember, it's always going to be more efficient to filter the data once (as it comes into your system) than every time you need to access it. Similarly, if you receive information about seven million customers in one monster document, but the information would be more useful to you in separate documents, break it into one document per customer. After all, if you received a fixed-width file from a mainframe system, you almost certainly wouldn't keep it around in that form because it wouldn't be particularly useful. Don't be afraid to dissect, reorganize, or otherwise modify XML documents to suit your needs.


Tip 2: Don't use XML for searching

XML documents (by themselves) are not well suited to being searched. Because they're just flat text, any of XML's native searching mechanisms (such as XPath) must parse the entire document (or documents) to locate the piece (or pieces) you're interested in. If you're trying to work with that single document with information about seven million customers, searching would be extremely inefficient. If you break the document up into smaller documents -- say, one per customer -- the problem still occurs: To find the particular customer you're looking for, you need to parse each document until you find the appropriate one. The only good solution for searching XML documents is to introduce some sort of indexing mechanism -- either a relational database index or some sort of native XML indexing tool -- that significantly reduces the amount of information that has to be processed to locate the document (or document fragment) you're interested in. When you have data-oriented information (as opposed to text-oriented information such as a book manuscript), a relational database is well suited for this task, and it provides other benefits, as you'll see in the next tip.


Tip 3: Don't use XML for summarization

Summarizing information stored in XML documents is also very inefficient. The native language provided by XPath contains only the bare minimum of aggregation functionality, and even this is not easily usable if the information you want to summarize is found in more than one document. Also, summarization presents the same problem as searching: Each document must be parsed to discover and extract the information being summarized. Again, I recommend indexing the information, thus reducing the amount of information to sift to discover the pieces that are being operated on. Alternatively, you could generate an additional document that contains summary information as detail XML documents are introduced into the system. However, that would not allow you to do ad hoc summarization, and it can be a bit of a management chore. For the best flexibility for summarization tasks, a relational database is really the only good choice; most off-the-shelf XML indexers do not expose the indexes themselves for direct programmatic manipulation.


Tip 4: Use XML to drive rendering

One real power of XML lies in its ability (via XSLT) to render its contents to various other forms. This is especially crucial if your system needs to support various means of data consumption -- through an HTML interface such as a desktop Web browser, through a portable device using WML, or to a data-transfer standard agreed upon by your industry. Relational data can drive rendering, too, but it's not as good at the job. Each possible rendering requires significant coding time. Also, if a request is received to render a piece of information that you have stored as an atomic XML document (such as a single customer), you can do so without touching the indexing system, which frees up cycles on that system to support the searching and summarization of the data as necessary.


Conclusion

This column looked at some of the ways XML fits into an overall system architecture and where it does (and doesn't) make sense. You've seen that some sort of indexing mechanism -- ideally a relational database -- should be part of your overall architecture in most cases. In short, use XML to perform the tasks it excels at, such as driving a rendering system.

As you're architecting (or rearchitecting) your systems, remember that XML is just another tool in your development toolbox. You wouldn't use a screwdriver to hammer in a nail. Don't try to make XML do things it isn't designed to do well.


Resources

  • Terence Parr, co-founder of jGuru, has his own ideas about when not to use XML.

  • IBM's DB2 Extender page gives a basic overview of how DB2 works with XML, with links to a detailed white paper on querying with XML, viewable as a PDF file, and to DB2 Extender downloads.

  • Need some detail about working with XML and IBM's DB2 and WebSphere Application Server? The IBM Redbook Integrating XML with DB2 XML Extender and DB2 Text Extender shows how to use XML technology efficiently in business applications, and explains how to integrate it with DB2 Universal Database, DB2 XML Extender and Text Extender, and WebSphere Application Server. This book will help developers to set up the environment and to create and process XML documents that can be stored and recovered using SQL.

  • Find other articles in Kevin William's XML for Data column.

About the author

Kevin Williams is the chief XML architect for Equient, a division of Veridian specializing in XML design for information management systems. He has also co-written several books on XML from Wrox Press. Random XML musings, tips, tricks, and opinionated rants may be found at his Web site www.blueoxide.com. You can contact Kevin at kevin@blueoxide.com. You stole from my kin!

Comments (Undergoing maintenance)



Trademarks  |  My developerWorks terms and conditions

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=XML
ArticleID=12026
ArticleTitle=XML for Data: Four tips for smart architecture
publish-date=08012001
author1-email=kevin@blueoxide.com
author1-email-cc=

My developerWorks community

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere).

My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Rate a product. Write a review.

Special offers