Topic
  • 3 replies
  • Latest Post - ‏2011-09-29T17:02:01Z by lfly
lfly
lfly
5 Posts

Pinned topic need help to find the right way to optimize our xml index

‏2011-09-29T08:49:34Z |
Hello,

We need some help to increase performance for our use of DB2 pureXml and XMLQUERY.

We have a DB2 table with one XML field in a DB2 UTF8 schema.
We have one row for each xml document we import in the table.

Here the script of the table :
db2 "CREATE TABLE TEST_XML (xml_content XML)"

Here an example of one XML File :
Attach Files : fichier_TEST_XML.xml

We have these Query :
select xmlquery('$d/data/title/text()' PASSING xml_content AS "d")
from d10.test_xml where
xmlexists('$d/data/entreprisesitem=20000' PASSING xml_content AS "d", 47664 as "Nent")
and xmlexists('$d/data/languetext()=$langue' PASSING xml_content as "d", 'fr' as "langue") ;
So we have declare 2 indexes on each element of the “where” clause :
Attach Files :
CREATE INDEX D10.IDX_EVAN ON D10.TEST_XML
(XML_CONTENT ASC)
GENERATE KEY USING XMLPATTERN '/data/entreprises/item'
AS SQL DOUBLE IGNORE INVALID VALUES ALLOW REVERSE SCANS;

CREATE INDEX D10.IDX_EVAN2 ON D10.TEST_XML
(XML_CONTENT ASC)
-- GENERATE KEY USING XMLPATTERN '//text()'
GENERATE KEY USING XMLPATTERN '/data/langue'
-- AS SQL VARCHAR HASHED IGNORE INVALID VALUES ALLOW REVERSE SCANS;
AS SQL VARCHAR(2) IGNORE INVALID VALUES ALLOW REVERSE SCANS;
The explain plan show us that the index are correctly use, but, with only 20 rows of these XML File we have a response time greater than 3s. We except for immediately result.

We have notice that if we search a enterprises/item number near the top of the “enterprises” node the response is better.

Have DB2 limitation about the leaf or node numbers ?
*D*o we need some special configuration on DB2 to use XML functions?**
Is XmlQuery using the query cache of DB2 ? (we didn’t see difference between the first call and the second for the moment)

Can you, please, help us to find the right way to optimize our xmltable, xmlfile, xml index and query ?

Thank you.
Updated on 2011-09-29T17:02:01Z at 2011-09-29T17:02:01Z by lfly
  • lfly
    lfly
    5 Posts

    Re: need help to find the right way to optimize our xml index

    ‏2011-09-29T08:59:29Z  
    problems with brackets : you have to read :

    We have these Query :

    select xmlquery('$d/data/title/text()' PASSING xml_content AS "d")
    from d10.test_xml where
    xmlexists('$d/data/entreprises[item=20000]' PASSING xml_content AS "d", 47664 as "Nent")
    and xmlexists('$d/data/langue[text()=$langue]' PASSING xml_content as "d", 'fr' as "langue") ;
  • MatthiasNicola
    MatthiasNicola
    321 Posts

    Re: need help to find the right way to optimize our xml index

    ‏2011-09-29T11:53:55Z  
    • lfly
    • ‏2011-09-29T08:59:29Z
    problems with brackets : you have to read :

    We have these Query :

    select xmlquery('$d/data/title/text()' PASSING xml_content AS "d")
    from d10.test_xml where
    xmlexists('$d/data/entreprises[item=20000]' PASSING xml_content AS "d", 47664 as "Nent")
    and xmlexists('$d/data/langue[text()=$langue]' PASSING xml_content as "d", 'fr' as "langue") ;
    Hello,

    thanks for your post and for attaching a sample document.

    DB2 has no limit on the number of nodes or leaf elements.
    No special configuration is needed to use the XML functionality in DB2.

    XML queries use the query cache (statement cache) just like SQL statements. If you did not see a big difference between the first and the subsequent executions of the query, then this is probably because the compile time for this simple query is very very short.

    I have created your table and indexes in DB2 9.7 and I have inserted the sample document 25 times. In one of the 25 documents I have changed one of the <item> elements to 999999, so that I can search for that one item. Searching for item 999999, I find that your query completes in approx. 0.19 seconds, and that's on my rather old laptop!

    I can improve this performance to approx. 0.13 seconds, if I define the index on <item> as VARCHAR(20) and then use double quotes to formulate a string predicate: item = "999999".

    Just for interest, try using a VARCHAR index and string predicate, and see if the performance is better.

    Another idea is the following: your document contains a long list of about 37,000 <item> elements, and each <item> holds a single integer number. Depending on the type of query, a sequential scan over that list (within a document) may be performed. if the vast majority of the data is just a list of integer values, you could store these integers in a separate relational table with an integer column. DB2 can perform that extraction for you, using the XMLTABLE function.

    Matthias

    Matthias Nicola
    http://www.tinyurl.com/pureXML
    http://nativexmldatabase.com/
  • lfly
    lfly
    5 Posts

    Re: need help to find the right way to optimize our xml index

    ‏2011-09-29T17:02:01Z  
    Hello,

    thanks for your post and for attaching a sample document.

    DB2 has no limit on the number of nodes or leaf elements.
    No special configuration is needed to use the XML functionality in DB2.

    XML queries use the query cache (statement cache) just like SQL statements. If you did not see a big difference between the first and the subsequent executions of the query, then this is probably because the compile time for this simple query is very very short.

    I have created your table and indexes in DB2 9.7 and I have inserted the sample document 25 times. In one of the 25 documents I have changed one of the <item> elements to 999999, so that I can search for that one item. Searching for item 999999, I find that your query completes in approx. 0.19 seconds, and that's on my rather old laptop!

    I can improve this performance to approx. 0.13 seconds, if I define the index on <item> as VARCHAR(20) and then use double quotes to formulate a string predicate: item = "999999".

    Just for interest, try using a VARCHAR index and string predicate, and see if the performance is better.

    Another idea is the following: your document contains a long list of about 37,000 <item> elements, and each <item> holds a single integer number. Depending on the type of query, a sequential scan over that list (within a document) may be performed. if the vast majority of the data is just a list of integer values, you could store these integers in a separate relational table with an integer column. DB2 can perform that extraction for you, using the XMLTABLE function.

    Matthias

    Matthias Nicola
    http://www.tinyurl.com/pureXML
    http://nativexmldatabase.com/
    Thanks,

    We will try to have less nodes "item" in the XML files : this seems be the cause of the problem.
    In this case (when we have 37000 nodes), we will use the keyword "ALL".
    Otherwise, we will investigate your propososition (separate relational table with an integer column).

    For retrieve <item> nodes, what is the most performant way ?
    • search on companies sorted in ascending numerical order (DOUBLE index) ?
    • or sorted alphabetically (VARCHAR index) ?
    Or is it the same way?

    For information, we work in DB2/aix64 9.5.4.

    Best regards