I work on various performance issues ( single node, DPF, pureScale ) and I somehow have found most of the DBAs still using the older snapshot way to collect the information. I don't have anything against snapshots but after using the new MON infrastructure to tackle various perf issues and also use it to monitor normal activity I thought maybe I should use my blog to encourage the use of this infrastructure. I don't think I can cover everything in one entry so probably I will spread it across a few entries and dwell on how to use it for specific issues.
In this entry I will try and lay out what is out there and what I use commonly.
One thing I should point out upfront is that, during hang situations ( unable to connect etc ) this is not a very useful tool and should not be used. db2pd is the tool of choice in hang conditions.
So, starting from some basics.
a) The MON infrastructure has been available since DB2 v9.7 GA and has been refined and enhanced ( and is continuously being enhanced ) in the later versions.
b) Its provides a easy SQL interface to real time memory of DB2 to print out details of it.
c) Its got time spent metrics, object metrics which can quickly narrow down the bottleneck for the user.
d) Its controlled by 3 DB config params ( MON_REQ_METRICS, MON_ACT_METRICS, MON_OBJ_METRICS ). Setting them to BASE is usually good enough.
e) It uses much less latching ( internal locking ) to get the information thus making it easier to use on really busy systems very frequently without any side effects.
Personally, I've used it very effectively on extremely busy systems with literally no side effects and was able to resolve a lot of issues using this. I would like to share my experiences here so that others can benefit as well from this great infrastructure which is built in the engine itself.
Now, that the intro is out of the way, let's get our hands dirty .. :-)
Let's start with a hypothetical scenario, say:
You have upgraded from v9.5 to v9.7 or v10.1 and all your benchmark queries are running slower and overall performance is struggling. You have checked the usual i.e. vmstat, iostat, db snapshot but you are not able to really pinpoint an obvious issue.
In this case, how can the new MON interface help you ??
It can actually help you a lot !!
Let's start with a monreport query .. you must be thinking .. huh! what's that? This is something the DB2 developers thought can help the customer by providing a quick overview of the whole database engine using some pre-packaged MON queries. There's very good information about these in the infocenter at this link. There are different kinds of report in this module but for a generic perf issue where there's overall slowness, I like to start with the dbsummary monreport. The syntax to run it is:
db2 connect to <dbname>
db2 "call monreport.dbsummary(60)" > dbsummary.txt
The parameter passed tells dbsummary to run for 60 seconds and provide an overall view of the internals of the DB2 engine as to what happened during those 60 seconds.
This report gives you everything i.e.
a) Transaction throughput
b) Wait times within the engine << This is the part I am most interested in for perf issues
c) Section processing times -- in other words, apart from the waits, how much time does it take to execute SQLs in the engine
d) Bufferpool hit ratio
e) Package cache hit ratios
f) SQL throughput
Small snippet of the report:
So, in the above output its clearly showing that the log flush is slow ( i.e. commit ) which is driving most of the wait times within the engine. Maybe moving to a faster log disk can help here or maybe grouping the commits.
You can calculate the package cache hit ratio from this data. If you see its very low, then increasing package cache size might help.
As can be seen the dbsummary report is a great starting point for a generic ( aka unknown ) performance problem and then narrow down accordingly. You can other monreport packages to help you do that as well. I use the currentsql() and pkgcache() liberally to find out any SQL statements which require attention.
I will continue further on specific situation .. keep an eye on this blog for further updates.