Recently, working for a customer I encountered an interesting situation which I would like to share ..
The customer was running a Web commerce site on DB2 ( v97fp6 ) and out of the blue a few of the inserts to table would take long ( as long as 10-15 secs under heavy load ). The interesting part was that say 8 out of 10 inserts to the table will be in millisecs but suddenly one will take long. The challenge was not only to track a long insert but also collect the data when its happening. Generally, the best data to debug such issues is our good friend db2trc ( DB2 trace ) as that will give exactly where in DB2 its spending most of its time.
So, thinking hard on how to time data collection for this issue, I resorted to my script "watchThreshold" script. It had the infrastructure already built in to trigger data collection ( i.e. can call script or run any command if and when a trigger happens ) it was a matter of adding code to check for the long running query .. What I essentially did was that based on what the user considers a "long running" query I would run db2pd -active in that frequency and if the same apphandle, uowid, activityid, anchorid and stmtid appear in two consecutive outputs, it means that its a long running query. Since, the issue was happening on a live production site with nearly 4000 connections there was no way I can turn on the trace for the whole instance without affecting the response times for the whole . here the v97 feature of turning on the trace only for one apphandle helped. So, in short, what I did in the script was to check db2pd -active ( every user configured time ) and found if the query was running in two consecutive iterations .. if yes, I triggered a script where I essentially coded the db2trc command ( db2trc on -i 512m -apphdl <apphandle> ) to start the trace only for the apphandle which was running the query and collect the information.
Now, the trick to all this is to identify the query .. as in db2pd -active you don't have the whole query but just the anchor id, stmtid .. I used the db2pd -dyn output to identify the anchor id/stmtid pair for the insert queries which were taking longer and turned on the script to capture the necessary info.
With this script, I was able to capture the right data at the right time to get to the bottom of the issue fairly quickly ..
The script options pertaining to this are:
( If multiple statements need to be tracked use option like -longQuery 1,2 -longQuery 4,2 and so on )
-longQueryTime <value> ( The trigger time in seconds or millisecs ( suffix ms to value for millisec ) -- Default 1 second )
The syntax I used to capture the info was:
watchThreshold.pl -numiter 5 -longQuery 622,1 -longQuery 4,3 -longQueryTime 500ms -script collect.ksh -scriptargs dbws -daemon