This topic has been locked.
3 replies Latest Post - 2012-05-02T18:58:03Z by MatthiasNicola
Pinned topic pureXML index performance
Answered question This question has been answered.
Unanswered question This question has not been answered yet.
I'm in the process of designing a new database structure to store document data received from a third-party. We currently receive the documents in XML format, shred them, and store them in a relational database with 17 tables. The database currently serves two functions: 1)Store documents for display at a later time 2)Store documents so that an application can scan for duplicate reports prior to making a new request to the third-party. I understand how moving to a pureXML storage layout will improve the performance of function #1, but I'm struggling to find any definitive research regarding function #2. Does anyone know of any research that has been done which compares index-based searches on relational tables to index-based searches on pureXML? For example, our current search might look for a document using customer name and birth date ---- assuming indexes are set up properly, would that same search have similar performance on a pureXML table? Thanks.
Updated on 2012-05-02T18:58:03Z at 2012-05-02T18:58:03Z by MatthiasNicola
MatthiasNicola 120000E28R321 PostsACCEPTED ANSWER
Re: pureXML index performance2012-05-02T02:55:53Z in response to andrewWoodmanHi Andrew,
query performance on relational tables vs. an XML column depends heavily on the types of queries that you need to run.
For example, retrieving a customer record based on customer name and date of birth can easily be handled by one or two XML indexes, and the index-based fetch of a single XML record is typically faster than performing lookups into 17 relational indexes to fetch rows from as many separate relational tables.
You'll find a performance comparison for this type of use case in the following article:
Table 10 in this article might also be useful for your question.
An additional consideration is the format in which the stored data needs to be returned to applications. Reconstructing the original XML (in whole or in parts) from normalized relational tables is known to be expensive and often better avoided by using an XML column.
This blog post also discusses XML vs. relational performance:
And finally, it doesn't have to be a question of either XML or relational, because the two are not mutually exclusive. Depending on the data access patterns you can choose to store the data in a hybrid format. For example, when you insert an XML document into an XML column you can extract a few selected elements or attributes into relational columns of the same or a different table. This adds to the insert cost (but is still cheaper than full shredding!) but can potentially help with more complex queries.
Further considerations for hybrid XML/relational storage are available here:
andrewWoodman 270005AE5W2 Posts