Topic
IC4NOTICE: developerWorks Community will be offline May 29-30, 2015 while we upgrade to the latest version of IBM Connections. For more information, read our upgrade FAQ.
11 replies Latest Post - ‏2013-02-27T08:29:22Z by MatthiasNicola
amcintyre
amcintyre
21 Posts
ACCEPTED ANSWER

Pinned topic Struggling with SELECTs and matching XML indexes

‏2013-02-12T00:52:13Z |
So XML index issues I thought I had resolved using db2batch to get SELECTs well into the millisecs, suddenly come back and take 2 secs. Not sure what changed, still recapping my steps, but maybe inlining?

Anyway, two questions:
1 - where is the most definitive (I really mean easy to understand) doc/examples for matching SELECTs with XML indexes? (other than the Native XML forum, the Devote page and the official doc, all of which I've read over and over)
2 - I thought I had read that db2 design advisor gives explicit XML index recommendations? (or do I need the Optim add on for that?)

Have DS32 and x64 DB2 10.1.

Thanks!!!
Updated on 2013-02-27T08:29:22Z at 2013-02-27T08:29:22Z by MatthiasNicola
  • MatthiasNicola
    MatthiasNicola
    321 Posts
    ACCEPTED ANSWER

    Re: Struggling with SELECTs and matching XML indexes

    ‏2013-02-12T02:53:35Z  in response to amcintyre
    Hi there,

    here are a few tips that you may find helpful.

    1. When query performance changes drastically it's a good idea to the check the query execution plan. My preferred tool is db2exfmt, but you can also use the Visual Explain in Data Studio. The execution plan tells you which indexes have been used, which join order was used, and so on.
    2. It's a good idea to keep a history of execution plans for your most important queries. For example, get execution plans and save them as files everytime there is a significant change to your system, such as new DB2 fixpack, a change to the set of available indexes, etc.. When performance drops you can compare the current execution plan to the one you had before when performance was good.
    3. As for the "most definitive" guide for matching XML queries with XML indexes, well, that may be a matter of taste, but the following paper can be helpful if you haven't read it yet:
    "Exploit XML indexes for XML query performance in DB2 9"
    http://www.ibm.com/developerworks/data/library/techarticle/dm-0611nicola/

    Also, the DB2 pureXML Cookbook has one chapter on XML indexes and one on performance.
    http://www.ibmpressbooks.com/bookstore/product.asp?isbn=0138150478
    4. The DB2 design advisor does not recommend XML indexes.
    There is a standalone Java tool that can be used to generate a set of candidate indexes for a given set of XML queries.
    https://www.ibm.com/developerworks/mydeveloperworks/groups/service/html/communityview?communityUuid=1fd71400-5986-4aa6-be0f-5053ed62379d
    5. You mentioned inlining....
    If you change the XML column from non-inlined to inlined then this can certainly impact performance. Typically the performance gets better :-). But, there can be cases when performance is negatively affected, and that is described here:
    http://nativexmldatabase.com/2012/07/11/xml-storage-in-db2-to-inline-or-not-to-inline/
    Matthias


    Matthias Nicola
    http://www.tinyurl.com/pureXML
    http://nativexmldatabase.com/
    • amcintyre
      amcintyre
      21 Posts
      ACCEPTED ANSWER

      Re: Struggling with SELECTs and matching XML indexes

      ‏2013-02-12T03:20:34Z  in response to MatthiasNicola
      Issue looks like it's volume related. The more entries in the index, the longer the query seems to take...

      so maybe I just need more predicates? but what if I can't come up with anymore?

      but it might be related to inlining (I put it in recently after reading your link 5 last week) and that case you mention with a 30k inline... because I keep bumping up my inline value to handle the largest xml packet I run into... possibly that isn't a good idea?

      are the smaller ones wasting space and causing more pages to be read? I was assuming it would pack the small ones in the row up to the inline value and not waste space. avg(admin_est_inline_length(xml)) says 424 and my inline length is set at 2000 but 99% are below 500 in length. out of millions...

      and thanks for all the links!!
      (sorry for the double post - browser stuttered or something)
      • MatthiasNicola
        MatthiasNicola
        321 Posts
        ACCEPTED ANSWER

        Re: Struggling with SELECTs and matching XML indexes

        ‏2013-02-12T08:46:32Z  in response to amcintyre
        If all or most of your queries are touching the XML column then it is generally good for performance to use an inline length that allows most documents to be inlined.

        Documents that are smaller than the inline length do not waste space. For example, if all of your documents can be stored in 500 bytes then doesn't matter if you set the inline length to 500 or 2000 or 30000. The number of rows per page will be the same, based on the actual size of the documents.

        From the information you have provided so far I do not yet see an indication why the inlining itself should hurt your query performance.

        Do you have an apples-to-apples comparison of the performance with and without inlining (with no other changes in your environment or data volume)?

        The first step should be to verify in the execution plan that indexes are used as expected. As a second step it is often helpful to use the DB2 monitoring capabilities to understand the number of logical and physical page reads, buffer pool hit ratio, temp I/O (if any), and so on.

        Also note that is not unusual for query performance to be dependent on the selectivity of the predicates. A query that examines and returns many XML documents takes longer than a query that zooms in on one or very few doucments.

        Matthias


        Matthias Nicola
        http://www.tinyurl.com/pureXML
        http://nativexmldatabase.com/
        • amcintyre
          amcintyre
          21 Posts
          ACCEPTED ANSWER

          Re: Struggling with SELECTs and matching XML indexes

          ‏2013-02-19T01:04:47Z  in response to MatthiasNicola
          My original post issue of:
          "XML index issues I thought I had resolved using db2batch to get SELECTs well into the millisecs, suddenly come back and take 2 secs"

          actually turned out to be because I was relying on db2batch timings to be consistent.

          For example, yesterday morning, I was getting the same slow 2 sec arithmetic means from dozens of the same db2batch runs. Then I started testing some other selects that ran the entire db, then when I got back to db2batch, without changing anything from the earlier runs, the means were subsecond. great but hmmmmmmm...

          So I thought, ok, index page caching or something... and to verify I rebooted and ran the db2batch runs again expecting the 2 sec times... but instead got the subsecond ones again.. so the problem mysteriously went away.. yea!!

          Hardware isn't issue. E5 Xeon 8 core/8 gig mem with 6Gbps SSDs and never any swapping.

          I'm calling this resolution "XML index aging" and am assuming the older my indexes are, the better they work... :-)
          • amcintyre
            amcintyre
            21 Posts
            ACCEPTED ANSWER

            Re: Struggling with SELECTs and matching XML indexes

            ‏2013-02-19T19:57:04Z  in response to amcintyre
            maybe this apar?
            http://www-01.ibm.com/support/docview.wss?uid=swg1IC87882
            • MatthiasNicola
              MatthiasNicola
              321 Posts
              ACCEPTED ANSWER

              Re: Struggling with SELECTs and matching XML indexes

              ‏2013-02-19T20:21:05Z  in response to amcintyre
              Hmmm, maybe. Did you have the exact same query multiple times in the input file for db2batch?

              You could certainly upgrade to FP2 where this issues is fixed, just to be safe.


              Matthias Nicola
              http://www.tinyurl.com/pureXML
              http://nativexmldatabase.com/
              • amcintyre
                amcintyre
                21 Posts
                ACCEPTED ANSWER

                Re: Struggling with SELECTs and matching XML indexes (and runstats)

                ‏2013-02-23T20:18:14Z  in response to MatthiasNicola
                I figured this out, I think.

                I have auto runstats off because during the day, it was causing to much contention and would slow down some real time SQL insert tasks I want to be very high speed and never queue.

                So while I was testing the xml index advisor last weekend, I tried one of its recommended runstats... and that's when the 2 sec queries immediately went to subsecond... just verified this again so ok, my bad... duh.

                but the question remains, how can I do runstats, either manually or automatic, during the day without causing so much contention? tables and indexes are all defined with defaults...

                this might be a question for the main forum but most of my indexes are over xml...

                thanks!!
                • amcintyre
                  amcintyre
                  21 Posts
                  ACCEPTED ANSWER

                  Re: Struggling with SELECTs and matching XML indexes (and runstats)

                  ‏2013-02-23T21:43:56Z  in response to amcintyre
                  The db cfg parm auto_runstats defaults to on... (and I had turned it off because of severe queuing of high volume inserts while it ran)

                  but the auto_sampling parm defaults to off... why is that I wonder?

                  I'll let you know...
                  • amcintyre
                    amcintyre
                    21 Posts
                    ACCEPTED ANSWER

                    Re: Struggling with SELECTs and matching XML indexes (and runstats)

                    ‏2013-02-26T00:48:19Z  in response to amcintyre
                    auto_sampling worked... in that it picked 73% and 59% sampling rates on the two largest tables for the runstats...

                    why rates that large I wonder?

                    but, those two runstats took 41secs each and caused major queueing on my high volume insert stream

                    so back to auto runstats off and manual runstats (probably at 1%) couple of times during the day

                    probably what isn't helping any is that these tables get dropped and created every morning...

                    thanks!!
                    • amcintyre
                      amcintyre
                      21 Posts
                      ACCEPTED ANSWER

                      Re: Struggling with SELECTs and matching XML indexes (and runstats)

                      ‏2013-02-26T01:24:21Z  in response to amcintyre
                      more testing and just figured out it was not runstats related...

                      steps to recreate the varying db2batch run time issue... (which is what started this thread)
                      1 - run db2batch runs with -a userid/pw parm - times are consistently 2secs
                      2 - do a manual db2 connect to the db with userid/pw (same as used in -a parm from step 1) then run same db2batch with -a parm again
                      3 - the 1st db2batch run after the connect has same time as step 1 but the 2nd thru n runs are now subsecond
                      4 - log off so manual connect goes away (the backend process) and you can recreate this easily

                      so something about the manual connect is caching something that the -a parm doesn't and that's what was making my db2batch times inconsistent

                      that took a while...
                      • MatthiasNicola
                        MatthiasNicola
                        321 Posts
                        ACCEPTED ANSWER

                        Re: Struggling with SELECTs and matching XML indexes (and runstats)

                        ‏2013-02-27T08:29:22Z  in response to amcintyre
                        Based on what you wrote this sounds like the performance difference between a hot and a cold DB2 bufferpool.

                        Upon the first connection to a DB2 database the bufferpool is allocated. Subsquent connections to the same database can then benefit from a potentially warm bufferpool, i.e. required data pages are already in memory so that physical I/O is avoided. The bufferpool stays in main memory as long as at least one application is connected.

                        When the last connection disappears, i.e. number of connections to the database goes down to zero, then the BP is deallocated. The next connection after that starts with a cold bufferpool.

                        So, let's see how this relates to what you wrote:

                        >> 1 - run db2batch runs with -a userid/pw parm - times are consistently 2secs

                        This makes sense if there is no existing connection to the database. Each db2batch run connects to the database, runs on a cold BP, and then disconnects when it is done. So, every run has to do the same physical I/O.
                        >> 2 - do a manual db2 connect to the db with userid/pw (same as used in -a parm from step 1) then run same db2batch with -a parm again
                        >> 3 - the 1st db2batch run after the connect has same time as step 1 but the 2nd thru n runs are now subsecond

                        The manual connection in your step 2 stays in place throughout the multiple db2batch runs in step 3, so the first db2batch run performs the physical I/O (takes 2sec) and all the subsequent runs benefit from a warm BP and are much faster.

                        Two more hints:

                        Instead of the manual connection you can also use the "activate database" command, so that the BP stays 'active' even when the number of connections goes to zero.

                        The input file for db2batch can contain directives to declare blocks (--#BGBLK n) so that each statement or each block of statments is executed n times in a row, in the same connection. db2batch then shows the min, max, and avg response times across these n repetitions. Very useful to make performance tests more reproducible. Check the db2batch documentation for details...

                        Matthias Nicola
                        http://www.tinyurl.com/pureXML
                        http://nativexmldatabase.com/