profiling sql statements using db2trc
BennyVerplancke 270006XMXM Comment (1) Visits (8960)
Troubleshooting an SQL statement that is slow or running in a non-optimal way can be a challenging task.
Usually there is some cardinality underestimation within the query compiler estimates that could lead to a poor execution plan choice.
are great to identify underestimations in an execution plan.
However the data provided by db2caem has some limitations.
"First of all, it requires the statement to actually complete. In some cases, performance can be so bad that even after hours or running, a statement doesn't complete. In this case, db2caem won't have the data we need."
This information is no longer correct, information gathered by a db2caem in progress can be obtained using the method described in :
Another downside is that you cannot execution db2caem on an already in progress statement, i.e. some statements don't run slow all the time,the execution time might depend on external factors, like parameter marker values.
In cases like this, it is difficult to gain transparency as to where the statement is actually spending time.
In DB2, an execution plan is made up of a graph of low level plan operators, otherwise known as "lolepops". These can represent a TBSCAN , IXSCAN, NLJOIN, HSJOIN, UNION etc... operator.
These operators are seen when running an explain on a statement ( db2expln or db2exfmt ),when a statement is executing, it spends its time in those operators.
For instance a Nested loop join with a large number of rows on the outer leg might be spending a lot of time in the NLJOIN lolepop.
Some statements have hundreds of these operators, so troubleshooting a query plan performance is not straightforward just from visual inspection of the plan alone.
If there was some runtime knowledge of where processing time is being spent then that task could become much easier.
This can be achieved by making use of a little known db2trc suboption. "-rds"
The option is documented in the Knowledge Center at this location.
Let's now take a small example. Running db2trc with no options should be done with extreme caution in production environments.
Gigabytes worth of trace information can be produced in a matter of minutes,which also means that performance can severely be impacted by db2trc.
So it is always advisable to limit the db2trc to trace only what you need.
In this case the input uses the application handle of the application currently running a slow SQL statement.
( assuming that slow means at least X number of minutes, sufficient to identify an application handle and to take a quick db2trc ).
We also add the "SQLRI" flag, this is the component that will produce the trace points we need, there is no need to fill the trace buffer with components that are not required.
db2trc on -l 128M -t -Madd SQLRI -apphdl <application handle>
Note that the -t option is used both in the collection and the formatting in order to obtain timestamps.
The resulting trc.flw file will look similar to this :
8 0.000101568 | | sqlrihsjnBuildPD entry [opNumber: 266, lolepop: 4] [eduid 38403
the -rds option has produced the "lolepop" information for us. Interpreting this from the trace alone is not that straightforward, but I attached a ( perl ) tool to this blog which parses this and produces a report :
$ ./db2prof --file trc.flw
so using this information I can tell that 90% of time is spent in operator 7
If I have the db2exfmt output for the statement then that points me to the "hot" operator and I can start finding explanations or solutions for this.
But this approach is somewhat static.
I added functionality to the tool to automatically collect information based on an input application handle.
$ ./db2prof --help
Then I use an ugly cartesian product to illustrate a poorly running / hanging sql statement :
select count_big(*) from staff s1, staff s2, staff s3, staff s4, staff s5, staff s6;
I obtain the application handle before running the sql :
$ db2 "values MON_
and then run the sql in that same session
then in another window, I release the profiling tool on the application handle and this produces the following output :
$ ./db2prof -apphdl 31
Checking if the statement didn't change during tracing...
Collecting db2expln info from the package cache
DB2 Universal Database Version 10.5, 5622-044 (c) Copyright IBM Corp. 1991, 2012
Total number of trace records : 1074323
I used the package cache anchor/userid values to obtain the data from db2expln. Some more parsing gives me a short explanation of the operator.
It saves the output in expln.out so afterwards I can get a more full view of the query plan. But with this information I can immediately see that operator 2 is where most time is spent.
The tool is not perfect yet and might have some flaws, but it should give an indication of time spent at the plan operator level, without too much impact
( the trace it uses is short and masked ).
The tool will work in partitioned environments but only for the logical partitions running on the host where the command was executed.
Future development will depend on uptake and usefulness.