Topic
  • 2 replies
  • Latest Post - ‏2013-10-05T01:29:00Z by Sigmazen
Sigmazen
Sigmazen
19 Posts

Pinned topic Runstats and xml-columns

‏2013-10-02T20:25:54Z |

Hi.  Quick question around the runstats taken for xml columns.  My scenario is this:

TableA has 60MM rows with an ID column, and an xml column housing the following sample document:

  • <MyXML>
  •   <MyTag>Hello</MyTag>
  • </MyXML>

Runstats were run, the syscat knows there are 60MM rows, and the index on the ID is working great.

Now I implement some new tags turning the above into a more complex document, for example:

  • <MyXML>
  •   <MyTag>Hello</MyTag>
  •   <High>
  •     <Med><Low>A</Low><Low>B</Low><Low>C</Low><Low>D</Low></Med>
  •     <Med><Low>E</Low><Low>F</Low></Med>
  •     <Med><Low>M</Low><Low>N</Low><Low>O</Low></Med>
  •   </High>
  • </MyXML>

The SELECT looks good (full-XMLpath), the explain looks good (joins).   There are around 2MM row updates per day depending on which ID comes in.

After Day 4 or so we start seeing higher cpu during the read Stored Proc which uses this new XML-document.  We're presuming it's because the new document structure isn't understood, so the DB needs to scan the document.  Therefore we're planning to rerun the Runstats today.

The question is:

  • Does the optimiser 'know' the new document structure at a table-level (ie so all future updated rows are automatically catered for) ??
  • Or is it row-by-row, and therefore I would need to run my runstats every other day (or weekend) until I'd updated all 60MM records.

Thanks in advance

Cheers

Simon

PS: Here are some EXPLAIN stats which I think answers the fact that some of the document is understood by the optimiser (ie instant find) but some of it isn't (ie document-scan find)?

|  |  XML Doc Navigation

|  |  |  Navigator is

|  |  |  |  /$CONTEXT_NODE$()/child::element(AcctProf)

|  |  |  |  /child::element(Lst8wkMaxAmt)(: Output Node Sequence Reference  :)

versus

|  |  |  |  |  XML Doc Navigation

|  |  |  |  |  |  Navigator is

|  |  |  |  |  |  |  /$CONTEXT_NODE$()(: Output Node Sequence Reference  :)(: #Residual Xpath Predicates = 1 :)

|  |  |  |  |  |  |  /unknown::unknown()

 

Updated on 2013-10-02T23:28:08Z at 2013-10-02T23:28:08Z by Sigmazen
  • MatthiasNicola
    MatthiasNicola
    321 Posts

    Re: Runstats and xml-columns

    ‏2013-10-03T08:50:03Z  

    Hi Simon,

    it's hard to draw a definitive conclusion without seeing the actual query and the full execution plan of the query (ideally produced by db2exfmt) before and after the change of the data (e.g. day 1 vs. day 4).

    Could you attach that information as text files or a zip file to your next post?

    With the statistics for XML columns, DB2 does not maintain information on a per-row basis. Keeping and storing XML statistics separately for each row would be prohibitively expensive. Instead, DB2 stores and uses aggregated statistics across all rows, such as the top most frequent paths and values in the XML documents.

    The increase in CPU usage that you are seeing might not be related to the statistics at all.

     

    To clarify: does the number of rows in the table change significantly from day to day? I assume not, because you indicate that the documents are updated with additional tags.

     

    Thanks,

     

    Matthias

     

  • Sigmazen
    Sigmazen
    19 Posts

    Re: Runstats and xml-columns

    ‏2013-10-05T01:29:00Z  

    Hi Simon,

    it's hard to draw a definitive conclusion without seeing the actual query and the full execution plan of the query (ideally produced by db2exfmt) before and after the change of the data (e.g. day 1 vs. day 4).

    Could you attach that information as text files or a zip file to your next post?

    With the statistics for XML columns, DB2 does not maintain information on a per-row basis. Keeping and storing XML statistics separately for each row would be prohibitively expensive. Instead, DB2 stores and uses aggregated statistics across all rows, such as the top most frequent paths and values in the XML documents.

    The increase in CPU usage that you are seeing might not be related to the statistics at all.

     

    To clarify: does the number of rows in the table change significantly from day to day? I assume not, because you indicate that the documents are updated with additional tags.

     

    Thanks,

     

    Matthias

     

    Hi there

    Thanks again; makes sense.  Number of actual rows is pretty much static 60MM this month; 60MM last month; 59MM beginning of the year type of thing.  It's just the xml-document which is new.

    Our runstats job is currently finishing up and I'll get the updated explain early next week so we'll see if we see a difference ... hopefully enough of the new xml-document was loaded into the DB to make a difference.

    Cheers

    Simon