Topic
  • 3 replies
  • Latest Post - ‏2012-05-02T18:58:03Z by MatthiasNicola
andrewWoodman
andrewWoodman
2 Posts

Pinned topic pureXML index performance

‏2012-05-01T20:49:10Z |
I'm in the process of designing a new database structure to store document data received from a third-party. We currently receive the documents in XML format, shred them, and store them in a relational database with 17 tables. The database currently serves two functions: 1)Store documents for display at a later time 2)Store documents so that an application can scan for duplicate reports prior to making a new request to the third-party. I understand how moving to a pureXML storage layout will improve the performance of function #1, but I'm struggling to find any definitive research regarding function #2. Does anyone know of any research that has been done which compares index-based searches on relational tables to index-based searches on pureXML? For example, our current search might look for a document using customer name and birth date ---- assuming indexes are set up properly, would that same search have similar performance on a pureXML table? Thanks.
Updated on 2012-05-02T18:58:03Z at 2012-05-02T18:58:03Z by MatthiasNicola
  • MatthiasNicola
    MatthiasNicola
    322 Posts

    Re: pureXML index performance

    ‏2012-05-02T02:55:53Z  
    Hi Andrew,

    query performance on relational tables vs. an XML column depends heavily on the types of queries that you need to run.

    For example, retrieving a customer record based on customer name and date of birth can easily be handled by one or two XML indexes, and the index-based fetch of a single XML record is typically faster than performing lookups into 17 relational indexes to fetch rows from as many separate relational tables.

    You'll find a performance comparison for this type of use case in the following article:
    http://www.ibm.com/developerworks/data/library/techarticle/dm-1201normalizationpart2/

    Table 10 in this article might also be useful for your question.

    An additional consideration is the format in which the stored data needs to be returned to applications. Reconstructing the original XML (in whole or in parts) from normalized relational tables is known to be expensive and often better avoided by using an XML column.

    This blog post also discusses XML vs. relational performance:
    http://purexml.wordpress.com/2010/08/22/xml-versus-relational-database-performance/

    And finally, it doesn't have to be a question of either XML or relational, because the two are not mutually exclusive. Depending on the data access patterns you can choose to store the data in a hybrid format. For example, when you insert an XML document into an XML column you can extract a few selected elements or attributes into relational columns of the same or a different table. This adds to the insert cost (but is still cheaper than full shredding!) but can potentially help with more complex queries.

    Further considerations for hybrid XML/relational storage are available here:
    http://www.ibm.com/developerworks/data/library/dmmag/DMMag_2009_Issue3/Tips/index.html

    Thanks,

    Matthias


    Matthias Nicola
    http://www.tinyurl.com/pureXML
    http://nativexmldatabase.com/
  • andrewWoodman
    andrewWoodman
    2 Posts

    Re: pureXML index performance

    ‏2012-05-02T18:23:48Z  
    Thanks Matthias. I guess our next step is mocking up a few different layout options and testing them out.
  • MatthiasNicola
    MatthiasNicola
    322 Posts

    Re: pureXML index performance

    ‏2012-05-02T18:58:03Z  
    Thanks Matthias. I guess our next step is mocking up a few different layout options and testing them out.
    Sounds good!


    Matthias Nicola
    http://www.tinyurl.com/pureXML
    http://nativexmldatabase.com/