Collecting and reporting performance monitor data with db2mon
db2mon is a set of Db2® command line processor (CLP) scripts that use Db2 lightweight in-memory monitoring interfaces to collect monitoring data for a specific period.
A db2mon.sh shell script provides a sampling of data collection and analysis. The script also checks that the prerequisites for db2mon are met.
Lightweight in-memory monitoring interfaces refer to the built-in routines that begin with "MON_GET", such as "MON_GET_DATABASE". The remainder of the topic refers to these routines collectively as MON_GET functions or the data that they return as MON_GET data.
MON_GET functions return data that increments from the time of database activation. Database activity is determined by collecting measurements at separate times, then calculating the difference. The db2mon scripts gather the information and process the calculations automatically by using stored procedures.
Before you begin
DB2 CREATE USER TEMPORARY TABLESPACE myTempTbsp
- MON_ACT_METRICS must be set at least to BASE, which is the default value.
- MON_REQ_METRICS must be set at least to BASE. Its default value is EXTENDED, which gives full monitor information on tables and indexes, and is an ideal setting for db2mon.
If you are running db2mon in offline mode, you must have the authority to create permanent tables and indexes.
About this task
- Online
- db2mon is run most commonly in online mode, where it runs to completion and
produces a report on standard output, on the system that it is monitoring. The report includes all
of the SQL that is used to find the delta values in the metrics and list the output.
db2mon.sh automatically connects to the database, creates a user temporary
table space and a small private buffer pool (to minimize impact to the system), runs
db2mon.sql, and then disconnects. You can run online mode two ways:
- By using db2mon.sh
- By using an existing connection
- Offline
- Separate SQL files are provided for export, import, and report generation. If the monitored system is performance-sensitive, the performance data can be collected by using the export script and transferred to another system for import and reporting.
The report that is generated in offline mode (by running db2mon_report.sql) is shorter than the report that is generated in online mode (by running db2mon.sql). The offline mode report is shorter because it does not have declared temporary table (DGTT) declarations and delta calculations.
- db2mon.sh
- db2mon.sql (concatenated form of db2monBefore.sql, db2monInterval.sql, and db2monAfter.sql)
- db2mon_import.sql
- db2mon_export.sql
- db2mon_report.sql
db2mon checks for a user temporary table space and the minimum required values for MON_ACT_METRICS and MON_REQ_METRICS. The results of this check are reported in the prerequisites section of the report.
Procedure
Follow these steps to collect and report performance monitor data with db2mon:
Results
- Point-in-time data (such as currently running SQL, utilities, and lock waits), which is collected at the beginning and end of the interval.
- Cumulative data, which is measured over the whole interval:
- Data that is collected at various hierarchical levels (such as database level, table space level, buffer pool level, table level, query level, and connection-level).
- Data that is collected for different deployment types (examples include standard Db2 ESE, Db2 pureScale, and Db2 with BLU).
- Environment information (such as database and instance configuration, registry variables, CPU counts and usage, and memory usage).
Example
The sample output is provided in an increasingly granular order, with high-level sections presented first, and sections with finer details provided later. Get a general sense of the performance of your database from the high-level sections, then use that information to determine which detailed sections to inspect to gain deeper insight.
Most of the tables in your report are wider than the following output. Some of the tables here are trimmed for readability. In some cases, the output is split and wrapped.
- Use the "Throughput metrics at database level" section to identify work that is done by the
system.
====================================== Throughput metrics at database level ====================================== select min(ts_delta) ts_delta, member, decimal((sum(act_completed_total) / float(min(ts_delta))), 10, 1) as act TS_DELTA MEMBER ACT_PER_S CMT_PER_S RB_PER_S DDLCK_PER_S SEL_P_S UID_P_S ROWS_INS_P_S -------- ------ --------- --------- ---------- ------------ --------- ---------- ------------ 35 0 22629.7 2361.1 0.0 0.0 13089.6 9540.0 4364.0 35 1 24331.0 2525.0 0.0 0.0 14064.1 10266.8 4638.2 35 2 27331.5 2842.1 0.0 0.0 15804.4 11527.1 5204.6 35 3 25674.2 2682.0 0.0 0.0 14859.5 10814.6 4878.8 4 record(s) selected.
The previous output shows four members, each completing approximately 2400 transactions per second (CMT_PER_S) or 25,000 SQL statements per second (ACT_PER_S), and additional metrics shown in the table. These transactions and statements are measured over 35 seconds (TS_DELTA). The process of collecting data can sometimes increase the planned interval slightly. In this case, the planned interval was 30 seconds.
Abbreviated names are common in the report:- ACT_PER_S: Activities per second
- CMT_PER_S: Commits per second
- RB_PER_S: Rollbacks per second
- DDLCK_PER_S: Deadlocks per second
- SEL_P_S: Select statements per second
- UID_P_S: Update/Insert/Delete statements per second
- ROWS_INS_P_S: Rows that are inserted per second
- Use the "Time breakdown at database level" section to identify processing time for the
database.
===================================================== Time breakdown at database level (wait + processing) ===================================================== select member, integer(sum(total_rqst_time)) as total_rqst_tm, decimal(sum(total_compile_time) / float(sum MEMBER TOTAL_RQST_TM PCT_COMPILE PCT_SECTION PCT_SORT PCT_COL PCT_COL_SYNOP PCT_COMMIT PCT_RBACK PCT_CONN ------ ------------- ----------- ----------- -------- ------- ------------- ---------- --------- -------- 0 1035374 0.04 78.59 0.74 0.00 0.00 9.38 0.00 0.10 1 438810 0.01 75.85 1.02 0.00 0.00 15.15 0.00 0.12 2 492605 0.01 74.57 1.06 0.00 0.00 16.66 0.00 0.09 3 482646 0.01 76.32 1.04 0.00 0.00 14.73 0.00 0.11 4 record(s) selected.
PCT_SECTION includes processing and wait times for SQL statements.
In this example, most time is spent processing sections (PCT_SECTION), with the next largest percentage of time spent in commit processing (PCT_COMMIT). This pattern is considered normal.
Compile times (PCT_COMPILE) over 0.15 (or 15%) on a transactional system can be a sign of an application that is using literals instead of parameter markers.
- The "Wait times at database level" section reports the time that the database
spends waiting. The "Wait times at database level" section is often the most important top-level
information to assess performance
health.
============================== Wait times at database level ============================== select w.member, integer(sum(total_rqst_time)) as total_rqst_tm, integer(sum(total_wait_time)) as total_wait MEMBER TOTAL_RQST_TM TOTAL_WAIT_TM PCT_RQST_WAIT PCT_LOCK PCT_GLB_LOCK PCT_LTCH PCT_LG_DSK PCT_RCLM PCT_CF ------ ------------- ------------- ------------- -------- ------------ -------- ---------- -------- ------ 0 1035374 732450 70.74 1.31 0.57 0.89 4.28 1.07 56.26 1 438810 291659 66.46 2.29 1.51 0.57 8.03 1.91 46.21 2 492605 320535 65.06 1.50 1.13 0.92 9.37 1.67 44.36 3 482646 319718 66.24 1.39 0.86 1.05 7.74 1.87 46.23 4 record(s) se1ected.
In this example, each member spends approximately 67% of each request waiting (PCT_RQST_WAIT). Most of this wait time is spent on cluster caching facility (CF) communications (PCT_CF), followed by log writes (PCT_LG_DST) and lock wait (PCT_LOCK).
- You can use various sections from the report to identify statements that use
the most database processing time.
Various sections of the report show statement data from the package cache, which means that statements finished running during the monitoring interval. Statements still running when monitoring finished are shown in the point-in-time section at the beginning of the report.
The top 100 statements by total activity time are listed, and different views are shown in tables, with a focus on views such as basic metrics, wait time, sorting, and I/O.
Compare the "Wait times at database level" section (from Example 3) with the "Top SQL statements by execution time" section in this example to determine whether high wait times are caused by either:- A few statements
- A system-wide issue that affects all statements
The "Top SQL statements by execution time" section shows raw CPU usage.
Output from report (continued):====================================== Top SQL statements by execution time ====================================== select member, integer(num_exec_with_metrics) as num_exec, coord_stmt_exec_time, decimal(coord_stmt_exec_time / double(num_exec_with_metrics), 10, 2) as avg_coord_exec_time, decimal( (coord_stmt_exec_time / double(... MEMBER NUM_EXEC COORD_STMT_EXEC_TIME AVG_COORD_EXEC_TIME PCT_COORD_STMT_EXEC_TIME TOTAL_CPU_TIME ... ------ -------- -------------------- ------------------- ------------------------ -------------- ... 0 61729 301979 4.89 15.09 29928189 ... 3 69026 150782 2.18 7.53 33255990 ... 2 72832 148618 2.04 7.42 34689087 ... 1 64562 137670 2.13 6.88 3121S907 ... 0 124839 112563 0.90 5.62 4468845 ... 0 124833 111863 0.89 5.59 3686860 ... 0 124836 58906 0.46 2.91 2807094 ... 0 11350 57011 5.02 2.84 1997313 ... 2 147141 46613 0.31 2.33 4830165 ... 3 138286 45568 0.32 2.27 4618735 ... 3 138285 40548 0.29 2.02 5767918 ... 2 147137 39975 0.27 1.99 6031554 ... 1 131525 39818 0.30 1.99 4457211 ...
... AVG_CPU_TIME PCT_WAIT_TIME AVG_SECT_TIME AVG_COL_TIME STMT_TEXT ... ------------ ------------- ------------- ------------ ---------------------------------------------- ... 484 53.67 4.89 0.00 Se1ect Count(Distinct S_I_ID) from STOCK, ORDE ... 481 40.93 2.18 0.00 Se1ect Count(Distinct S_I_ID) from STOCK, ORDE ... 476 36.56 2.04 0.00 Se1ect Count(Distinct S_I_ID) from STOCK, ORDE ... 483 41.59 2.13 0.00 Se1ect Count(Distinct S_I_ID) from STOCK, ORDE ... 35 89.65 0.90 0.00 Update STOCK set S_QUANTITY = ?, S_YTD = ?, S_ ... 29 90.54 0.89 0.00 Se1ect S_QUANTITY, S_DIST_01, S_DIST_02, S_DIS ... 22 85.45 0.46 0.00 Insert into ORDER_LINE va1ues (?, ?, ?, ?, ?, ... 175 93.18 5.02 0.00 Update ORDER_LINE set OL_DELIVERY_D = ? where ... 32 84.75 0.31 0.00 Se1ect S_QUANTITY, S_DIST_01, S_DIST_02, S_DIS ... 33 85.28 0.32 0.00 Se1ect S_QUANTITY, S_DIST_01, S_DIST_02, S_DIS ... 41 83.84 0.29 0.00 Update STOCK set S_QUANTITY = ?, S_YTD = ?, S_ ... 40 82.78 0.27 0.00 Update STOCK set S_QUANTITY = ?, S_YTD = ?, S_ ... 33 86.20 0.30 0.00 Se1ect S_QUANTITY, S_DIST_01, S_DIST_02, S_DIS 13 record(s) selected.
The "Wait time breakdown" section shows statements that are not running because they are waiting for resources, such as (but not limited to) disk, latches, and locks.
Output from report (continued):============================================================== Wait time breakdown for top SQL statements by execution time ============================================================== select member, decimal((total_act_wait_time / double(total_act_time)) * 100, 5, 2) as pct_wait, decimal((log_disk_wait_time / double(total_act_time)) * 100, 5, 2) as pct_lg_dsk, decimal((log_buffer_wait... MEMBER PCT_WAIT PCT_LG_DSK PCT_LG_BUF PCT_LOCK PCT_GL_LK PCT_LTCH PCT_RCLM PCT_CF PCT_PFTCH PCT_DIAG ... ------ -------- ---------- ---------- -------- --------- -------- -------- ------- --------- -------- ... 0 53.67 0.00 0.00 2.37 0.36 2.34 0.00 47.55 0.00 0.00 ... 3 40.93 0.00 0.00 2.29 1.02 2.57 0.00 30.97 0.00 0.00 ... 2 36.56 0.00 0.00 1.42 0.65 1.88 0.00 29.37 0.00 0.00 ... 1 41.59 0.00 0.00 5.04 3.21 1.05 0.00 31.96 0.00 0.00 ... 0 89.65 0.16 0.00 0.00 0.00 0.07 0.35 89.06 0.00 0.00 ... 0 90.54 0.00 0.00 2.32 2.29 0.00 3.59 75.30 0.00 0.00 ... 0 85.45 4.61 0.00 0.00 0.00 0.71 3.77 75.19 0.00 0.00 ... 0 93.18 0.00 0.00 0.00 0.00 0.02 0.05 88.41 0.00 0.00 ... 2 84.75 0.00 0.00 5.16 5.12 0.00 4.94 63.97 0.00 0.00 ... 3 85.28 0.00 0.00 0.23 0.19 0.00 7.03 67.87 0.00 0.00 ... 3 83.84 0.19 0.00 0.00 0.00 0.01 1.50 82.12 0.00 0.00 ... 2 82.78 0.20 0.00 0.00 0.00 0.01 0.71 81.85 0.00 0.00 ... 1 86.20 0.00 0.00 0.00 0.33 0.00 4.85 70.23 0.00 0.00 ... 1 83.34 0.22 0.00 0.00 0.00 0.00 0.90 82.20 0.00 0.00 ...
... PCT_POOL_RD PCT_DIR_WR PCT_DIR_RD STMT_TEXT ... ----------- ---------- ---------- ----------------------------------------------------------------- ... 1.40 0.00 0.00 Se1ect Count(Distinct S_I_ID) from STOCK, ORDER_LINE where (S_W_I ... 5.06 0.00 0.00 Se1ect Count(Distinct S_I_ID) from STOCK, ORDER_LINE where (S_W_I ... 3.87 0.00 0.00 Se1ect Count(Distinct S_I_ID) from STOCK, ORDER_LINE where (S_W_I ... 3.52 0.00 0.00 Se1ect Count(Distinct S_I_ID) from STOCK, ORDER_LINE where (S_W_I ... 0.00 0.00 0.00 Update STOCK set S_QUANTITY = ?, S_YTD = ?, S_ORDER_CNT = ?, S_RE ... 9.27 0.00 0.00 Se1ect S_QUANTITY, S_DIST_01, S_DIST_02, S_DIST_03, S_DIST_04, S_ ... 0.49 0.00 0.00 Insert into ORDER_LINE va1ues (?, ?, ?, ?, ?, ?, ?, ?, ?, ?) ... 4.67 0.00 0.00 Update ORDER_LINE set OL_DELIVERY_D = ? where OL_W_ID = ? and OL_ ... 10.32 0.00 0.00 Se1ect S_QUANTITY, S_DIST_01, S_DIST_02, S_DIST_03, S_DIST_04, S_ ... 10.12 0.00 0.00 Se1ect S_QUANTITY, S_DIST_01, S_DIST_02, S_DIST_03, S_DIST_04, S_ ... 0.00 0.00 0.00 Update STOCK set S_QUANTITY = ?, S_YTD = ?, S_ORDER_CNT = ?, S_RE ... 0.00 0.00 0.00 Update STOCK set S_QUANTITY = ?, S_YTD = ?, S_ORDER_CNT = ?, S_RE ... 10.76 0.00 0.00 Se1ect S_QUANTITY, S_DIST_01, S_DIST_02, S_DIST_03, S_DIST_04, S_ ... 0.00 0.00 0.00 Update STOCK set S_QUANTITY = ?, S_YTD = ?, S_ORDER_CNT = ?, S_RE 14 record(s) selected.
Tip: The "Statement and Plan Identifiers" section shows the EXECUTABLE_ID for each of the top 100 statements. You can use the EXECUTABLE_ID with the MON_GET_PKG_CACHE_STMT function to get the access plan for any of these statements. You can also use EXECUTABLE_ID to determine the explain plan, by using the following command:db2 "call explain_from_section(x'<executable id>','M',NULL,0,'<current user>',?,?,?,?,?)"
The "Top SQL statements by execution time, aggregated by PLANID" section is similar to the "Top SQL statement by execution time" section. All statements that differ by literal values (and have the same PLANID hash value) are aggregated to show the total costs. This information is useful when a transactional application is driving lots of SQL statements with literal values.
The "IO statistics per statement" section shows the buffer pool activity for the top 100 SQL statements.
Output from report (continued):========================================================== IO statistics per stmt - top statements by execution time ========================================================== select member, integer(num_exec_with_metrics) num_exec, decimal(pool_data_l_reads/double(num_exec_with_metrics), 16,1) as avg_d_lrd, decimal(pool_data_p_reads/double(num_exec_with_metrics), 10,1) ... MEMBER NUM_EXEC AVG_D_LRD AVG_D_PRD AVG_I_LRD AVG_I_PRD AVG_TD_PRD AVG_TI_PRD ... ------ ----------- ---------- ---------- ---------- ---------- ----------- ----------- ... 0 61729 0.0 0.0 607.4 0.1 0.0 0.0 ... 3 69026 0.0 0.0 607.4 0.4 0.0 0.0 ... 2 72832 0.0 0.0 607.3 0.2 0.0 0.0 ... 1 64562 0.0 0.0 607.3 0.2 0.0 0.0 ... 0 124839 2.0 0.0 4.0 0.0 0.0 0.0 ... 0 124833 1.1 0.1 3.0 0.0 0.0 0.0 ... 0 124836 1.0 0.0 4.1 0.0 0.0 0.0 ... 0 11350 20.3 0.4 4.1 0.0 0.0 0.0 ... 2 147141 1.1 0.1 3.0 0.0 0.0 0.0 ... 3 138286 1.1 0.1 3.0 0.0 0.0 0.0 ... 3 138285 2.0 0.0 4.0 0.0 0.0 0.0 ... 2 147137 2.0 0.0 4.0 0.0 0.0 0.0 ... 1 131525 1.1 0.1 3.0 0.0 0.0 0.0 ... 1 131523 2.0 0.0 4.0 0.0 0.0 0.0 ... 0 124837 0.0 0.0 2.0 0.0 0.0 0.0 ... 0 11343 0.0 0.0 4.1 0.0 0.0 0.0 ... 2 147135 1.0 0.0 4.1 0.0 0.0 0.0 ... 3 138285 1.0 0.0 4.1 0.0 0.0 0.0 ... 1 131522 1.0 0.0 4.1 0.0 0.0 0.0 ... 0 12533 1.0 0.0 6.0 0.0 0.0 0.0 ... 2 13568 20.4 0.4 4.1 0.0 0.0 0.0 ...
... AVG_COL_LRD AVG_COL_PRD AVG_DIR_R_RQS AVG_DIR_W_RQS STMT_TEXT ... ------------ ------------ ------------- ------------- -------------------------- ... 0.0 0.0 0.0 0.0 Se1ect Count(Distinct S_I_] ... 0.0 0.0 0.0 0.0 Se1ect Count(Distinct S_I_] ... 0.0 0.0 0.0 0.0 Se1ect Count(Distinct S_I_] ... 0.0 0.0 0.0 0.0 Se1ect Count(Distinct S_I_] ... 0.0 0.0 0.0 0.0 Update STOCK set S_QUANTITY ... 0.0 0.0 0.0 0.0 Se1ect S_QUANTITY, S_DIST_0 ... 0.0 0.0 0.0 0.0 Insert into ORDER_LINE valu ... 0.0 0.0 0.0 0.0 Update ORDER_LINE set OL_DE ... 0.0 0.0 0.0 0.0 Se1ect S_QUANTITY, S_DIST_0 ... 0.0 0.0 0.0 0.0 Se1ect S_QUANTITY, S_DIST_0 ... 0.0 0.0 0.0 0.0 Update STOCK set S_QUANTITY ... 0.0 0.0 0.0 0.0 Update STOCK set S_QUANTITY ... 0.0 0.0 0.0 0.0 Se1ect S_QUANTITY, S_DIST_0 ... 0.0 0.0 0.0 0.0 Update STOCK set S_QUANTITY ... 0.0 0.0 0.0 0.0 Se1ect I_NAME, I_PRICE, I_D ... 0.0 0.0 0.0 0.0 Se1ect sum(OL_AMOUNT) from ... 0.0 0.0 0.0 0.0 Insert into ORDER_LINE valu ... 0.0 0.0 0.0 0.0 Insert into ORDER_LINE valu ... 0.0 0.0 0.0 0.0 Insert into ORDER_LINE valu ... 0.0 0.0 0.0 0.0 Insert into ORDERS va1ues ( ... 0.0 0.0 0.0 0.0 Update ORDER_LINE set OL_DE 21 record(s) selected.
I/O information can indicate whether a statement is using an index. AVG_I_LRD stands for average index logical reads per execution. AVG_D_LRD stands for average data logical reads. In the previous output, the first four entries have high AVG_I_LRD values and AVG_D_LRD values of zero. This combination of values is indicative of an index-based plan. If a statement has high data logical reads and low index logical reads, changes to the indexes might improve performance.
- Use the "Database log write times" section to examine log disk performance.
Statements whose execution times are relatively higher than the other statements might suggest a poorly optimized access plan. If you add an index and it does not provide a solution, compare the highest wait times from the database and statement levels. This information identifies the largest contributors to wait time, then consults the appropriate report sections to provide more detailed information.
In db2mon, the individual wait time percentages (such as log disk wait time) are calculated as a percent of total request time, not as a percent of total wait time. Total wait time (where everything adds up to 100% wait time) can be calculated by using monreport.dbsummary procedure.
A high percent of log disk wait time, which is reported in the PCT_LG_DSK column in the "Wait times at database level" section (from Example 3) is common for transactional applications.
Output from report (continued):========================== Database log write times ========================== select member, num_log_write_io, case when ts_delta > 0 then decimal( double(num_log_write_io) / ts_delta, 10, 4 ) else null end as ... MEMBER NUM_LOG_WRITE_IO LOG_WRITE_IO_PER_S LOG_WRITE_MB_PER_S ... ------ ---------------- ------------------ ------------------ ... 0 57237 1788.6562 7.0000 ... 1 68286 2133.9375 8.3437 ... 2 69373 2167.9062 8.5312 ... 3 69286 2165.1875 8.4687 ...
... LOG_WRITE_TIME LOG_WRITE_TIME_PER_IO_MS NUM_LOG_BUFFER_FULL ... -------------- ------------------------ -------------------- ... 25306 0.4421 0 ... 26345 0.3858 0 ... 26669 0.3894 0 ... 26741 0.3859 0 4 record(s) selected.
In the previous output, the log write time per I/O (LOG_WRITE_TIME_PER_IO_MS) is approximately 0.4 milliseconds, which is optimal. Values vary between systems, but log write percentage times above 4.0 milliseconds indicate that changes to storage configuration might be required. Reconfiguration includes changes to cache, RAID, and number of LUNs per file system.
- Use the "Bufferpool read statistics" section and the "Disk read and write I/O times" section to
examine read times.Note: The output in the "Bufferpool read statistics" section includes the following abbreviations:
- POOL_DATA_L_READS: Buffer pool data page logical reads (base table ORGANIZE BY ROW access)
- POOL_DATA_P_READS: Buffer pool data page physical reads
- POOL_INDEX_L_READS: Buffer pool index page logical reads (index access)
- POOL_INDEX_P_READS: Buffer pool index page physical reads
- POOL_COL_L_READS: Buffer pool column-organized data page logical reads (base table ORGANIZE BY COLUMN access)
- POOL_COL_P_READS: Buffer pool column-organized data page physical reads
Output from report (continued):============================ Bufferpool read statistics ============================ select member, substr(bp_name,1,20) as bp_name, pool_data_l_reads, pool_data_p_reads, pool_index_l_reads, pool_index_p_reads, pool_col_l_reads, pool_col_p_reads, pool_read_time, case when ... MEMBER BP_NAME POOL_DATA_L_READS POOL_DATA_P_READS POOL_INDEX_L_READS ... ------ ------------- ------------------ ------------------ ------------------- ... 2 IBMDEFAULTBP 3037899 44159 45043859 ... 3 IBMDEFAULTBP 2841993 41020 42586533 ... 1 IBMDEFAULTBP 2696917 38268 40018451 ... 0 IBMDEFAULTBP 2518978 38610 37512769 ...
... POOL_INDEX_P_READS POOL_COL_L_READS POOL_COL_P_READS POOL_READ_TIME AVG_READ_TIME ... ------------------- ----------------- ----------------- -------------- ------------- ... 27427 0 0 19619 0.27 ... 34728 0 0 21063 0.27 ... 23941 0 0 17030 0.27 ... 10298 0 0 28304 0.57 4 record(s) selected.
High pool read times is a common issue. High pool read times are reported in the PCT_POOL_RD column in the "Wait time breakdown for top SQL statements by execution time" section (from Example 4). This issue is especially common when the system is reading many pages from disk into buffer pool. This type of heavy page read activity typically occurs shortly after database activation, or when a new application starts to make connections. Average pool read times (AVG_READ_TIME) are reported at the buffer pool level and the table space level. Similarly, a high percent direct I/O time for large objects can be tracked in AVG_DRCT_READ_TIME and AVG_DRCT_WRITE_TIME.
Output from report (continued):=============================== Disk read and write I/O times =============================== select member, substr(tbsp_name,1,20) as tbsp_name, (pool_data_p_reads + pool_index_p_reads+ pool_col_p_reads) as num_reads, case when ((pool_data_p_reads + pool_index_p_reads+ pool ... MEMBER TBSP_NAME NUM_READS AVG_READ_TIME DIRECT_READ_REQS AVG_DRCT_READ_TIME ... ------ ---------- ---------- ------------- ---------------- ------------------ ... 3 TBS_OM 29235 0.27 0 - ... 2 TBS_OM 22195 0.26 0 - ... 2 TBS_CST 21312 0.27 0 - ... 3 TBS_CST 19700 0.28 0 - ... 1 TBS_OM 19201 0.26 0 - ... 1 TBS_CST 18485 0.27 0 - ... 0 TBS_CST 17705 0.56 0 - ... 2 TBS_STK 16568 0.27 0 - ... 0 TBS_STK 16127 0.59 0 - ... 3 TBS_STK 15559 0.27 0 - ...
... NUM_WRITES AVG_WRITE_TIME DIRECT_WRITE_REQS AVG_DRCT_WRITE_TIME ... ----------- -------------- ------------------ ------------------- ... 0 - 0 - ... 0 - 0 - ... 0 - 0 - ... 0 - 0 - ... 0 - 0 - ... 0 - 0 - ... 0 - 0 - ... 0 - 0 - ... 0 - 0 - ... 0 - 0 - 10 record(s) selected.
- Use the "Round-trip CF" section to determine round-trip time for messages between members and
CFs.
Percentage CF wait time is reported in PCT_CF in "Wait times at database level" (Example 3) and "Wait time breakdown" (Example 4). If the PCT_CF values in your report are high, the "Round-trip CF" section shows performance data on messages that are exchanged between the Db2 member and the CFs.
=================================================================== Round-trip CF command execution counts and average response times =================================================================== select member, id, cast(substr(cf_cmd_name,1,30) as varchar(30)) as cf_cmd_name, total_cf_requests, decimal( doub... MEMBER ID CF_CMD_NAME TOTAL_CF_REQUESTS AVG_CF_REQUEST_TIME_MICRO ------ ------ --------------------- ------------------ ------------------------- 2 128 SetLockState 1016554 60.11 3 128 SetLockState 955295 65.82 1 128 SetLockState 905310 62.04 0 128 SetLockState 851689 223.05 3 128 ReadAndRegister 682169 88._8 2 128 ReadAndRegister 671832 83.39 1 128 ReadAndRegister 662450 84.51 0 128 ReadAndRegister 627922 282.94 2 128 SetLockStateMultiple 413551 70.50 3 128 SetLockStateMultiple 387168 77.93 1 128 SetLockStateMultiple 367427 73.74 0 128 SetLockStateMultiple 345954 252.11 12 record(s) selected.
- Use the "Page reclaim metrics" section to examine page reclaim metrics for
index and data pages.
Reclaim wait time is reported in PCT_RCLM in "Wait times at database level" (Example 3) and "Wait time breakdown" (Example 4). If PCT_RCLM values in your report are above zero, the following section shows which table has the most reclaim activity, and whether the table or an index on the table is reclaimed:
=============================================== Page reclaim metrics for index and data pages =============================================== select member, substr(tabschema,1,20) as tabschema, substr(tabname,1,40) as tabname, substr(objtype,1,10) as objtype, data_partition_id, iid, (page ... MEMBER TABSCHEMA TABNAME OBJTYPE DATA_PARTITION_ID IID PAGE_RECLAIMS RECLAIM_WAIT_TIME ------ ---------- --------------- -------- ----------------- ---- -------------- ------------------ 0 DTW ORDERS INDEX - - 4334 3357 2 DTW ORDER_LINE INDEX - - 3870 3096 2 DTW ORDERS INDEX - - 3652 2732 3 DTW ORDERS INDEX - - 3638 2724 1 DTW ORDER_LINE INDEX - - 3634 2666 3 DTW ORDER_LINE INDEX - - 3577 2391 1 DTW ORDERS INDEX - - 3502 2326 0 DTW ORDER_LINE INDEX - - 4169 2286 0 DTW STOCK_501_750 INDEX - - 165 983 0 DTW STOCK_751_1000 INDEX - - 199 952 1 DTW NEW_ORDER INDEX - - 441 880 11 record(s) selected.
The sample output shows highest reclaim activity on indexes on ORDERS and ORDER_LINE across all the members. However, these RECLAIM_WAIT_TIME values are not high and this can be confirmed by examining the PCT_RCLM (percent reclaim wait time) in the "Wait times at database level" section. High reclaim activity of index pages is most common, and can be improved by using RANDOM indexes, CURRENT MEMBER partitioning, and range partitioning.
- Use the "Page reclaim metrics for SMP pages" section to identify reclaim activity of space map
pages (SMPs). For details about space map page reclaim metrics, see
spacemappage_reclaim_wait_time - Space map page reclaim wait time monitor element
.
The following output shows reclaimed SMP pages, which are often due to heavy inserts into a table whose table space extent size is too small.
==================================== Page reclaim metrics for SMP pages ==================================== select member, substr(tabschema,1,20) as tabschema, substr(tabname,1,40) as tabname, substr(objtype,1,10) as objtype, data_partition_id, iid, (spacemapp ... MEMBER TABSCHEMA TABNAME OBJTYPE DATA_PARTITION_ID IID SMP_PAGE_RECLAIMS SMP_PAGE_RECLAIM_WAIT_TIME ------ --------- ---------- ------- ----------------- --- ----------------- -------------------------- 3 DTW ORDER_LINE TABLE - - 9 254 2 DTW ORDER_LINE INDEX - - 57 241 0 DTW ORDER_LINE INDEX - - 90 192 0 DTW ORDER_LINE TABLE - - 9 170 3 DTW ORDER_LINE INDEX - - 67 169 1 DTW ORDER_LINE INDEX - - 57 76 1 DTW ORDER_LINE TABLE - - 3 10 2 DTW ORDER_LINE TABLE - - 4 3 3 DTW NEW_ORDER TABLE - - 2 3 1 DTW NEW_ORDER TABLE - - 1 2 10 record(s) se1ected.
- Use the "Latch wait metrics" section to identify latch waits.
Latch wait time percentages is reported in PCT_LTCH in "Wait times at database level" (Example 3) and "Wait time breakdown" (Example 4). Latch wait time percentages (PCT_LTCH) higher than 15% to 20% is considered high. The following section shows details for latch waits by type:
Output from report (continued):==================== Latch wait metrics ==================== select member, substr(latch_name,1,60) as latch_name, total_extended_latch_wait_time as tot_ext_latch_wait_time_ms, total_extended_latch_waits ... MEMBER LATCH_NAME TOT_EXT_LATCH_WAIT_TIME_MS ... ------ ------------------------------------- -------------------------- ... 0 SQLO_LT_SQLB_BPD_bpdLatch_SX 7182 ... 3 SQLO_LT_SQLB_BPD_bpdLatch_SX 4085 ... 2 SQLO_LT_SQLB_BPD_bpdLatch_SX 3086 ... 1 SQLO_LT_SQLB_BPD_bpdLatch_SX 1764 ... 0 SQLO_LT_SQLB_BPD_WARLatch 1272 ... 2 SQLO_LT_SQLB_BPD_WARLatch 1168 ... 3 SQLO_LT_SQLB_BPD_WARLatch 706 ... 1 SQLO_LT_SQLB_BPD_WARLatch 509 ... 0 SQLO_LT_SQLP_LHSH_hshlatch 263 ... 3 SQLO_LT_SQLI_INX_PAGE_CACHE_ipcLatch 240 ... 2 SQLO_LT_SQLI_INX_PAGE_CACHE_ipcLatch 221 ... 1 SQLO_LT_SQLI_INX_PAGE_CACHE_ipcLatch 193 ... 0 SQLO_LT_SQLI_INX_PAGE_CACHE_ipcLatch 175 ...
... TOT_EXT_LATCH_WAITS TIME_PER_LATCH_WAIT_MS ... -------------------- ---------------------- ... 4454 1.61 ... 1965 2.07 ... 2088 1.47 ... 1724 1.02 ... 1783 0.71 ... 2637 0.44 ... 2110 0.33 ... 1870 0.27 ... 758 0.34 ... 36 6.66 ... 45 4.91 ... 21 9.19 ... 25 7.00 13 record(s) selected.
Most values for TIME_PER_LATCH_WAIT_MS are well below a few seconds, measured in a 30-second period, across all agents that operate on the system. Therefore, this system shows no significant latching issue.
High latch wait time percentages can result from various causes. Use the following techniques to address scenarios with more significant latch wait:- In Db2 pureScale, a strong correlation exists between latch wait and page reclaim time. Even 2% to 5% of reclaim wait time can end up causing 20% to 40% of latch wait time in some scenarios. Examining the "Page reclaim metrics for index and data pages" section (Example 8) might indicate that the table or index needs altering. This is especially true if both latch wait times and reclaim wait times are high on the same SQL statements.
- Significant improvements in latching were made in Db2 Version 11.1. If a
latch with the name
SQLO_LT_SQLB_HASH_BUCKET_GROUP_HEADER__groupLatch
has the highest latch wait time in the "Latch wait metrics" section, then upgrading to Version 11 might reduce the time spent to hold the latch. - The
SQLB_BPD__bpdLatch
type is associated with a processing-intensive buffer pool page. Identify the table that is involved by looking at statements with high latch wait time. One approach to reduce latch contention is to range partition the table and use partitioned indexes.