Topic
  • 4 replies
  • Latest Post - ‏2012-06-05T23:50:12Z by MatthiasNicola
Sigmazen
Sigmazen
19 Posts

Pinned topic pureXML performance issues

‏2012-05-25T23:29:00Z |
Hi
We have recently started migrating (where appropriate) from a column-based table solution to an xml-based table solution due to the changing nature of our data (ie new attributes are added on a frequent basis). This has all been very successful and we see good response times (<10ms) for some relatively complex querying across five tables, three of which have xml-columns we are using. The DB receives around 500 hits per second.

As part of the final stages of this migration we recently converted our remaining column-based attributes to xml-based attributes bringing the total to around 500, and saw the query response times increase alarmingly (two groupings 40-60ms and 160-240ms) which in-turn is causing some upstream impacts.
The two-groupings situation is currently thought to be wait times because the cpu is so busy doing work on the executing queries, but I'm open to suggestions.
We have subsequently NULL'd out around 60% of the nice-to-have attributes and query times are just about acceptable for the short term.

The number of tables in the joins are the same in number and when we completely NULL out the attributes in the SELECT the response times return to their original <10ms, so it appears as though the SELECT is the culprit.

The query is similar to this:

SELECT XMLCAST(XMLQUERY('$D_A_PRFL//AcctNbr' passing B.ACCT_PRFL_XML_TX as "D_A_PRFL") AS CHAR(19))
, XMLCAST(XMLQUERY('$D_A_PRFL//AcctDesc' passing B.ACCT_PRFL_XML_TX as "D_A_PRFL") AS CHAR(6))
, XMLCAST(XMLQUERY('$D_A_PRFL//AcctStatCd' passing B.ACCT_PRFL_XML_TX as "D_A_PRFL") AS CHAR(2))
, XMLCAST(XMLQUERY('$D_A_PRFL//AcctStartDtnormalize-space()' passing B.ACCT_PRFL_XML_TX as "D_A_PRFL") AS DATE)
, XMLCAST(XMLQUERY('$D_A_PRFL//AcctCancDtnormalize-space()' passing B.ACCT_PRFL_XML_TX as "D_A_PRFL") AS DATE)
. . .
x480 attributes
. . .
, XMLCAST(XMLQUERY('$A_PRFL//CityPostCd' passing C.ACCT_PRFL_XML_TX as "A_PRFL") AS CHAR(15))
, XMLCAST(XMLQUERY('$A_PRFL//CityRgnCd' passing C.ACCT_PRFL_XML_TX as "A_PRFL") AS CHAR(15))
, XMLCAST(XMLQUERY('$D_A_PRFL//PostCd' passing B.ACCT_PRFL_XML_TX as "D_A_PRFL") AS CHAR(15))
, XMLCAST(XMLQUERY('$D_A_PRFL//CntryCd' passing B.ACCT_PRFL_XML_TX as "D_A_PRFL") AS CHAR(3))

FROM ACCT_PROFILE_REALTIME AS B
LEFT OUTER JOIN ACCT_PROFILE_NEARTIME AS C
ON C.ACCT_NO = B.ACCT_NO
LEFT OUTER JOIN ACCT_PROFILE_BATCH AS D
ON D.ACCT_NO = B.ACCT_NO

WHERE B.ACCT_NO = ?


I'm currently reading the excellent DB2 pureXML Cookbook but am not all the way through to pick up all the tips, especially around XMLTABLE() benefits, but what I have read I wonder whether that approach would benefit anyway because wouldn't the DB engine need to convert the xml to relational columns either way?

So, my questions are:
  • Has anyone been ramping up pureXML attributes to the 500 number and noticed a dramatic increase in cpu usage due to (what I believe) is the xml functionality?
  • Before I go ahead and try the XMLTABLE approach, would the move from five hundred XMLQUERYs to a four or five XMLTABLEs save the cpu?

I'm really hoping for a magic wand on this because we've spent a lot of time and effort moving to this concept, it's been very successful on the 'dynamic' side of things, but the nightmare scenario will be that we need to migrate back to columns from pureXML to regain the response times :-(

Thanking you all in advance for any experiences, help you can pass on.
Cheers
Simon

PS: PMR 02632,227,000
Updated on 2012-06-05T23:50:12Z at 2012-06-05T23:50:12Z by MatthiasNicola
  • Sigmazen
    Sigmazen
    19 Posts

    Re: pureXML performance issues

    ‏2012-06-02T20:55:55Z  
    Hi
    Just to answer my own post :-/
    We rewrote from x500 xmlcast+xmlquery statements to one xmltable containing x500 columns and found at 25% improvement in response times and 30% increase in thru-put due to reduced cpu usage per request :-)

    My next question is just to validate my thinking on why this happened:
    • Using the xmlcast+xmlquery approach I presume the complete xml document is first pulled from the data pages into the Bufferpool and then during the creation of each attribute within the SELECT piece of the query, it goes to the BP, pulls out that single attribute from the xml document and xmlcast+xmlquery's it. This process is then repeated 500 times.
    • However, using the xmltable approach I presume DB2 knows it needs to create a derived xmltable and therefore the complete document is first pulled from the data pages into the Bufferpool but then all the attributes are created in a single hit ie the xml document is only read once using xmltables, not 500 times using xmlcast+xmlquery

    Is this assumption correct?
    Cheers
    Simon
  • MatthiasNicola
    MatthiasNicola
    321 Posts

    Re: pureXML performance issues

    ‏2012-06-03T04:57:34Z  
    • Sigmazen
    • ‏2012-06-02T20:55:55Z
    Hi
    Just to answer my own post :-/
    We rewrote from x500 xmlcast+xmlquery statements to one xmltable containing x500 columns and found at 25% improvement in response times and 30% increase in thru-put due to reduced cpu usage per request :-)

    My next question is just to validate my thinking on why this happened:
    • Using the xmlcast+xmlquery approach I presume the complete xml document is first pulled from the data pages into the Bufferpool and then during the creation of each attribute within the SELECT piece of the query, it goes to the BP, pulls out that single attribute from the xml document and xmlcast+xmlquery's it. This process is then repeated 500 times.
    • However, using the xmltable approach I presume DB2 knows it needs to create a derived xmltable and therefore the complete document is first pulled from the data pages into the Bufferpool but then all the attributes are created in a single hit ie the xml document is only read once using xmltables, not 500 times using xmlcast+xmlquery

    Is this assumption correct?
    Cheers
    Simon
    Hi Sigma Zen,

    in your original query I see that all of the 500 XPath start with a //, such as $D_A_PRFL//AcctDesc. With the // at the beginning of the path, the XPath processor needs to search all branches and all levels of the document tree exhaustively to find any <AcctDesc> elements anywhere in the document. And this exhaustive search is repeated 500 times, once for each XPath in your query. This is expensive. The more precisely you specify the path to an element the more efficient the search, allowing DB2 to ignore branches and levels of the document tree that don't match the path.

    If you try to reduce the usage of // and use fully specified XPath expressions wherever possible then performance might be better.

    Matthias

    Matthias Nicola
    http://www.tinyurl.com/pureXML
    http://nativexmldatabase.com/
  • Sigmazen
    Sigmazen
    19 Posts

    Re: pureXML performance issues

    ‏2012-06-05T15:19:48Z  
    Hi Sigma Zen,

    in your original query I see that all of the 500 XPath start with a //, such as $D_A_PRFL//AcctDesc. With the // at the beginning of the path, the XPath processor needs to search all branches and all levels of the document tree exhaustively to find any <AcctDesc> elements anywhere in the document. And this exhaustive search is repeated 500 times, once for each XPath in your query. This is expensive. The more precisely you specify the path to an element the more efficient the search, allowing DB2 to ignore branches and levels of the document tree that don't match the path.

    If you try to reduce the usage of // and use fully specified XPath expressions wherever possible then performance might be better.

    Matthias

    Matthias Nicola
    http://www.tinyurl.com/pureXML
    http://nativexmldatabase.com/
    Hi Matthias
    Thanks for the reply.

    Yep; I'd put the full pathname in pretty much after the post itself after reading one of your blogs so it's fully qualified now.

    Two questions though:
    #1: Our actual xml document is essentially flat. It has the root and then all nodes are on the next, same level. So, DB2 goes to the root, then down a level and finds the tag. How much overhead is that really causing?
    #2: Is there any performance difference you are aware of between many (eg 500) xmlcast/xmlquery statements vs one xmltable with the same number of (ie 500) generated columns ??
    Would DB2 traverse the entire document once for xmltable, and hop in and out the bufferpool 500 times for the xmlcast/xmlquery approach?

    Thanks again
    Cheers
    Simon
  • MatthiasNicola
    MatthiasNicola
    321 Posts

    Re: pureXML performance issues

    ‏2012-06-05T23:50:12Z  
    • Sigmazen
    • ‏2012-06-05T15:19:48Z
    Hi Matthias
    Thanks for the reply.

    Yep; I'd put the full pathname in pretty much after the post itself after reading one of your blogs so it's fully qualified now.

    Two questions though:
    #1: Our actual xml document is essentially flat. It has the root and then all nodes are on the next, same level. So, DB2 goes to the root, then down a level and finds the tag. How much overhead is that really causing?
    #2: Is there any performance difference you are aware of between many (eg 500) xmlcast/xmlquery statements vs one xmltable with the same number of (ie 500) generated columns ??
    Would DB2 traverse the entire document once for xmltable, and hop in and out the bufferpool 500 times for the xmlcast/xmlquery approach?

    Thanks again
    Cheers
    Simon
    Hi Simon,

    since you have a PMR open (and a separate discussion thread as part of the PMR process) I recommend to keep all communication on this problem in one place, i.e. in the PMR, and to drive the resolution of the problem through the PMR. The IBM support person assigned to the PMR will reach out to me as needed.

    As a short answer to your question: Yes, XMLTABLE has certain optimizations for extracting multiple items from the same XML document. That is, it can often navigate the document more efficiently. However, regardless of whether you use XMLTABLE or the XMLQUERY+XMLCAST approach, each document is read into the BP only once.

    Thanks,

    Matthias


    Matthias Nicola
    http://www.tinyurl.com/pureXML
    http://nativexmldatabase.com/