IBM Support

Measuring latching overhead with monitoring functions in Db2

Technical Blog Post


Abstract

Measuring latching overhead with monitoring functions in Db2

Body

what is a latch?

Db2 is a multi threaded application written using the imperative programming model. Like any other multi threaded application, Db2 has its own mechanism to protect the common address base shared among the threads. It can be explained by the below given diagram: -

  image-20200224151323-1

In the diagram, two threads are updating the same structure in the memory but the consistency is kept by latches. Next question for any Db2 DBA would be, what is different than lock/wait? It is an internal implementation of Db2 to protect shared resources. There can be different scenarios: -

1)Reader and reader: - If the two threads are doing a read operation then “No problem”. It does not have big impact on the concurrency.

2)Reader/writer: - If the operation is writing and reading or writing /writing by the two threads, it could cause some slowdown as db2 would need an exclusive latch to keep the data structure intact.

When and how to determine that system needs to be looked from the latching.

Db2 always acquire and release latches which are not even noticeable in most of the cases. It is a very fast operation within the precision of nanoseconds but in the highly concurrent systems having multiple cores processor, it is good to monitor latching overhead. In the older releases, db2pd infrastructure has been used to look into the latch waits but starting in V10.1 and V10.5, new table function MON_GET_EXTENDED_LATCH_WAIT has been introduced which uses SQL infrastructure to get information on the latch wait. It is a light weight and human readable. Extended means that the requester yielded CPU while requesting latch and considered as some contention on the database.

Example

db2 "select substr(latch_name,1,32) as latch_name,TOTAL_EXTENDED_LATCH_WAITS,TOTAL_EXTENDED_LATCH_WAIT_TIME from  table(mon_get_extended_latch_wait(null))"

TOTAL_EXTENDED_LATCH_WAIT_TIME: -The amount of time, in milliseconds, spent in extended latch waits. This is an accumulative number from the time of database was activated.

LATCH_NAME                       TOTAL_EXTENDED_LATCH_WAITS        TOTAL_EXTENDED_LATCH_WAIT_TIME

--------------------------------  -------------------------      ------------------------------

SQLO_LT_NO_IDENTITY                                 14565                           11

SQLO_LT_SMemPool__MemLatchType__                       10                           18

SQLO_LT_SQLB_HASH_BUCKET_GROUP_H                    16699                           10848050

SQLO_LT_SQLD_PARSCAN__prefetchLa                       71                           54

SQLO_LT_SQLB_BPD__bpdLatch_SX                         565                           5419

SQLO_LT_SQLB_POOL_CB__extentAnch                        3                           44

SQLO_LT_SQLD_TCB__loadInProgress                        4                           32

SQLO_LT_ABPDispatcher                                   3                           0

SQLO_LT_ABPDistributor                                  3                           2

In the above example, it is an OLTP highly concurrent system and SQLB latches are noticeable. SQLO_LT_SQLB_HASH_BUCKET_GROUP is the one hot latch. The reason for picking it as a hot latch is based on the problem definition i.e. gradual slowdown when the concurrent queries execute on the system and component it is related to i.e. bufferpool (SQLB). Next step would be to find out, how can it be avoided?

MON_GET_PKG_CACHE_STMT:- This table function can be used to find out the SQL details based on the average latch wait.


db2 "select substr(stmt_text,1,50) as stmt_text,decimal(float(total_extended_latch_wait_time)/num_executions,10,5)as avg_latch_time from table(mon_get_pkg_cache_stmt(null,null,null,null)) where num_executions > 0 order by avg_latch_time desc fetch first 10 rows only"

STMT_TEXT                                                     AVG_LATCH_TIME

----------------------------------------                      --------------

Select Count(Distinct S_I_ID) from STOCK                             48.60632

Select S_QUANTITY, S_DIST_01, S_DIST_02,                              0.26249

Update STOCK set S_QUANTITY = ?, S_YTD =                              0.26086

Update ORDER_LINE set OL_DELIVERY_D = ?                               0.00067

Select OL_I_ID, OL_SUPPLY_W_ID, OL_DELIV                              0.00027

Select O_OL_CNT, O_C_ID from ORDERS wher                              0.00024

Update ORDERS set O_CARRIER_ID = ? where                              0.00022

Select D_NAME, D_STREET_1, D_STREET_2, D                              0.00017

Insert into NEW_ORDER values (?, ?, ?)                                0.00011

Insert into ORDER_LINE values (?, ?, ?,                               0.00008

Note, sometime latches are not related to the SQL execution so latch name might not have statement associated to it. Mostly in highly concurrent OLTP workload this SQL infrastructure can help to determine the overhead on the bufferpool latches and see if the queries can be tuned or avoid the access to hot pages in the bufferpool. Few basic techniques suggested on bufferpool related latches: -

• In above simple example, all hot statements are heavily biased to index access (this can be seen in       monitor data and access plans)

• This suggests some internal contention around an index page due to the extremely heavy activity

• General tip: you can often reduce contention by breaking up a single resource into multiple (smaller) ones. Example would be range partitioning/add more indexes.

CONCLUSION

Latches are required in database and usually acts as a switch ON/OFF which is very fast operation. When doing a health check look holistically and find a tipping point for the performance problem and then see how much quantification should go to the concurrency which would lead into assessing your database for latching. One example would be high CPU utilization when there is more workload on the system leads into slowdown after some time.

It is out of the scope of this article to include all the details about latching because it requires study of the environment/symptoms and source code of DB2.

REFERENCE

https://www.ibm.com/support/pages/node/412739

https://www.ibm.com/support/knowledgecenter/SSEPGG_10.5.0/com.ibm.db2.luw.sql.rtn.doc/doc/r0055017.html

https://www.ibm.com/support/knowledgecenter/SSEPGG_10.5.0/com.ibm.db2.luw.sql.rtn.doc/doc/r0059271.html

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSEPGG","label":"Db2 for Linux, UNIX and Windows"},"Component":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

UID

ibm13286785