Collecting and reporting performance monitor data with db2mon

db2mon is a set of Db2® command line processor (CLP) scripts that use Db2 lightweight in-memory monitoring interfaces to collect monitoring data for a specific period.

A db2mon.sh shell script provides a sampling of data collection and analysis. The script also checks that the prerequisites for db2mon are met.

Lightweight in-memory monitoring interfaces refer to the built-in routines that begin with "MON_GET", such as "MON_GET_DATABASE". The remainder of the topic refers to these routines collectively as MON_GET functions or the data that they return as MON_GET data.

MON_GET functions return data that increments from the time of database activation. Database activity is determined by collecting measurements at separate times, then calculating the difference. The db2mon scripts gather the information and process the calculations automatically by using stored procedures.

Before you begin

The db2mon CLP scripts require a connection to the database. db2mon.sql also requires the database to have a user temporary table space. Many systems already have a user temporary table space. If your database does not have a user temporary table space, you can create one with the CREATE TABLESPACE statement:
DB2 CREATE USER TEMPORARY TABLESPACE myTempTbsp
For db2mon.sql and db2mon_export.sql to properly collect monitoring data, monitoring must be enabled at the database level with the following database configuration parameters:
  • MON_ACT_METRICS must be set at least to BASE, which is the default value.
  • MON_REQ_METRICS must be set at least to BASE. Its default value is EXTENDED, which gives full monitor information on tables and indexes, and is an ideal setting for db2mon.

If you are running db2mon in offline mode, you must have the authority to create permanent tables and indexes.

About this task

You can choose between two modes to collect and analyze performance monitor data with db2mon:
Online
db2mon is run most commonly in online mode, where it runs to completion and produces a report on standard output, on the system that it is monitoring. The report includes all of the SQL that is used to find the delta values in the metrics and list the output. db2mon.sh automatically connects to the database, creates a user temporary table space and a small private buffer pool (to minimize impact to the system), runs db2mon.sql, and then disconnects. You can run online mode two ways:
  • By using db2mon.sh
  • By using an existing connection
Offline
Separate SQL files are provided for export, import, and report generation. If the monitored system is performance-sensitive, the performance data can be collected by using the export script and transferred to another system for import and reporting.

The report that is generated in offline mode (by running db2mon_report.sql) is shorter than the report that is generated in online mode (by running db2mon.sql). The offline mode report is shorter because it does not have declared temporary table (DGTT) declarations and delta calculations.

The following files are provided as part of db2mon, and are located in your instance directory under ~/sqllib/samples/perf:
  • db2mon.sh
  • db2mon.sql (concatenated form of db2monBefore.sql, db2monInterval.sql, and db2monAfter.sql)
  • db2mon_import.sql
  • db2mon_export.sql
  • db2mon_report.sql

db2mon checks for a user temporary table space and the minimum required values for MON_ACT_METRICS and MON_REQ_METRICS. The results of this check are reported in the prerequisites section of the report.

Procedure

Follow these steps to collect and report performance monitor data with db2mon:

  1. Run db2mon by using one of the three following approaches:
    • Online mode by using db2mon.sh:
      1. Ensure that performance information is collected during normal database activity or in parallel with a test workload that is running on the database.
      2. From the command line, type:
        db2mon.sh MyDatabaseName > db2mon.out
        where MyDatabaseName is the name of the database that you are monitoring.
        By default, db2mon.sql collects MON_GET data over a 30-second period. db2mon.sh can take a second optional argument to collect data over a different period, which is specified in seconds. For example, to produce an online report that collects performance data for 120 seconds, type:
        db2mon.sh MyDatabaseName 120 > db2mon-120s.out
        Note: It is advised that you collect data for a maximum of 300 seconds (5 minutes) to avoid wrapping some of the counters. For monitoring longer periods, collecting successive reports is preferred.
    • Online mode by using an existing database connection:
      1. Run db2mon on your current database connection, as follows:
        db2 -tvf db2mon.sql > db2mon.out
        Important: If you run db2mon on your current connection and interrupt db2mon.sql, such as pressing Ctrl-C while it is running, the CURRENT SCHEMA special register might still be set to SESSION_USER. As a result, any SQL statements that are run after the interruption might be affected. If the connection is interrupted, you might need to manually change CURRENT SCHEMA to the original value. db2mon uses CURRENT SCHEMA to resolve its table references to the DGTTs that are used during online collection in db2mon.sql.
    • Offline mode:
      1. Run the following command to generate an output report:
        db2 –tvf db2mon_export.sql > db2mon_export.out
        db2mon_export.sql exports the contents of all MON_GET functions that db2mon uses to IXF files that are created in the current working directory. The content is exported twice; when the script starts, and when the script ends. The I/O processing of the export operation is low.
        Note: Exporting all columns can be helpful when reference to the original data is needed for analysis. Ad hoc queries from the command line or altered versions of db2mon_report.sql can take advantage of the extra metrics when used with the exported data set.
      2. The IXF files can be transferred to another system for import into another Db2 database to create the report. The report can be created on any operating system or version of Db2. For versions of Db2 before Version 11.1.3.3, you also need to copy the db2mon SQL scripts. All of the IXF files must be transferred (by using the scp command, for example) to the system where the report will be created. For example, to transfer all of the IXF files to the directory reports/2018-02-16 by using the dbreports account on the analysis1 system:
        scp *.IXF dbreports@analysis1:reports/2018-02-16
      3. On the system where the report will be created, go to the directory where the IXF files are located.
      4. Run the following command to import the data from the IXF files:
        db2 –tvf db2mon_import.sql > db2mon_import.out
        Important: If you run db2mon on your current connection and interrupt db2mon.sql, such as pressing Ctrl-C while it is running, the CURRENT SCHEMA special register is likely to be set to SESSION_USER. As a result, the subsequent SQL that is run on that connection is affected. db2mon uses the proper schema for permanent tables that are produced by db2mon_import.sql and db2mon_report.sql.
        db2mon_import.sql uses the Db2 IMPORT utility to reconstitute the MON_GET data back into Db2 tables for analysis. IMPORT is used because it creates tables automatically. You do not need to reproduce CREATE TABLE statements to match the source system's tables.

        The imported tables are permanent tables, not DGTTs. They are created in the current default schema and table space. CURRENT SCHEMA can be set to a unique value to represent this set of monitoring data, which allows multiple data sets to be stored.

      5. Run the following command to generate a report that you can analyze:
        db2 -tvf db2mon_report.sql > db2mon_report.out
  2. If you ran the report in online mode, check the report for errors.

    Many errors can be reported in db2mon.out when the script attempts to drop tables that do not exist. These errors can be ignored. Other types of errors might indicate that a user temporary table space doesn't exist, or that the script was generated for the wrong version of Db2.

  3. View the report and analyze its contents. Details are provided in the Results section.
    Note: The report section begins after the text "STARTS". The report can be wide, so it is best to use a text editor or viewer that is capable of viewing long lines, such as less or vim.

Results

The report includes many useful queries, roughly grouped into the following sections:
  • Point-in-time data (such as currently running SQL, utilities, and lock waits), which is collected at the beginning and end of the interval.
  • Cumulative data, which is measured over the whole interval:
    • Data that is collected at various hierarchical levels (such as database level, table space level, buffer pool level, table level, query level, and connection-level).
    • Data that is collected for different deployment types (examples include standard Db2 ESE, Db2 pureScale, and Db2 with BLU).
    • Environment information (such as database and instance configuration, registry variables, CPU counts and usage, and memory usage).

Example

Sample output is provided to illustrate different ways the output report can be analyzed.

The sample output is provided in an increasingly granular order, with high-level sections presented first, and sections with finer details provided later. Get a general sense of the performance of your database from the high-level sections, then use that information to determine which detailed sections to inspect to gain deeper insight.

Most of the tables in your report are wider than the following output. Some of the tables here are trimmed for readability. In some cases, the output is split and wrapped.

  1. Use the "Throughput metrics at database level" section to identify work that is done by the system.
    ======================================
     Throughput metrics at database level
    ======================================
    
    select min(ts_delta) ts_delta, member, decimal((sum(act_completed_total) / float(min(ts_delta))), 10, 1) as act
    
    TS_DELTA MEMBER ACT_PER_S CMT_PER_S   RB_PER_S  DDLCK_PER_S   SEL_P_S    UID_P_S ROWS_INS_P_S 
    -------- ------ --------- --------- ---------- ------------ --------- ---------- ------------ 
          35      0   22629.7    2361.1        0.0          0.0   13089.6     9540.0       4364.0 
          35      1   24331.0    2525.0        0.0          0.0   14064.1    10266.8       4638.2 
          35      2   27331.5    2842.1        0.0          0.0   15804.4    11527.1       5204.6 
          35      3   25674.2    2682.0        0.0          0.0   14859.5    10814.6       4878.8 
    
      4 record(s) selected.

    The previous output shows four members, each completing approximately 2400 transactions per second (CMT_PER_S) or 25,000 SQL statements per second (ACT_PER_S), and additional metrics shown in the table. These transactions and statements are measured over 35 seconds (TS_DELTA). The process of collecting data can sometimes increase the planned interval slightly. In this case, the planned interval was 30 seconds.

    Abbreviated names are common in the report:
    • ACT_PER_S: Activities per second
    • CMT_PER_S: Commits per second
    • RB_PER_S: Rollbacks per second
    • DDLCK_PER_S: Deadlocks per second
    • SEL_P_S: Select statements per second
    • UID_P_S: Update/Insert/Delete statements per second
    • ROWS_INS_P_S: Rows that are inserted per second
  2. Use the "Time breakdown at database level" section to identify processing time for the database.
    =====================================================
     Time breakdown at database level (wait + processing)
    =====================================================
    
    select member, integer(sum(total_rqst_time)) as total_rqst_tm, decimal(sum(total_compile_time) / float(sum
    
    MEMBER TOTAL_RQST_TM PCT_COMPILE PCT_SECTION PCT_SORT PCT_COL PCT_COL_SYNOP PCT_COMMIT PCT_RBACK PCT_CONN
    ------ ------------- ----------- ----------- -------- ------- ------------- ---------- --------- --------
         0       1035374        0.04       78.59     0.74    0.00          0.00       9.38      0.00     0.10
         1        438810        0.01       75.85     1.02    0.00          0.00      15.15      0.00     0.12
         2        492605        0.01       74.57     1.06    0.00          0.00      16.66      0.00     0.09
         3        482646        0.01       76.32     1.04    0.00          0.00      14.73      0.00     0.11
    
      4 record(s) selected.

    PCT_SECTION includes processing and wait times for SQL statements.

    In this example, most time is spent processing sections (PCT_SECTION), with the next largest percentage of time spent in commit processing (PCT_COMMIT). This pattern is considered normal.

    Compile times (PCT_COMPILE) over 0.15 (or 15%) on a transactional system can be a sign of an application that is using literals instead of parameter markers.

  3. The "Wait times at database level" section reports the time that the database spends waiting. The "Wait times at database level" section is often the most important top-level information to assess performance health.
    ==============================
     Wait times at database level
    ==============================
    
    select w.member, integer(sum(total_rqst_time)) as total_rqst_tm, integer(sum(total_wait_time)) as total_wait
    
    MEMBER TOTAL_RQST_TM TOTAL_WAIT_TM PCT_RQST_WAIT PCT_LOCK PCT_GLB_LOCK PCT_LTCH PCT_LG_DSK PCT_RCLM PCT_CF
    ------ ------------- ------------- ------------- -------- ------------ -------- ---------- -------- ------
         0       1035374        732450         70.74     1.31         0.57     0.89       4.28     1.07  56.26
         1        438810        291659         66.46     2.29         1.51     0.57       8.03     1.91  46.21
         2        492605        320535         65.06     1.50         1.13     0.92       9.37     1.67  44.36
         3        482646        319718         66.24     1.39         0.86     1.05       7.74     1.87  46.23
    
      4 record(s) se1ected.

    In this example, each member spends approximately 67% of each request waiting (PCT_RQST_WAIT). Most of this wait time is spent on cluster caching facility (CF) communications (PCT_CF), followed by log writes (PCT_LG_DST) and lock wait (PCT_LOCK).

  4. You can use various sections from the report to identify statements that use the most database processing time.

    Various sections of the report show statement data from the package cache, which means that statements finished running during the monitoring interval. Statements still running when monitoring finished are shown in the point-in-time section at the beginning of the report.

    The top 100 statements by total activity time are listed, and different views are shown in tables, with a focus on views such as basic metrics, wait time, sorting, and I/O.

    Compare the "Wait times at database level" section (from Example 3) with the "Top SQL statements by execution time" section in this example to determine whether high wait times are caused by either:
    • A few statements
    • A system-wide issue that affects all statements
    System-wide issues might be improved by configuration changes or changes in hardware. Statement performance can be improved by using techniques such as altering SQL, adding or removing indexes, and refreshing statistics with the RUNSTATS command.

    The "Top SQL statements by execution time" section shows raw CPU usage.

    ======================================
     Top SQL statements by execution time
    ======================================
    
    select member, integer(num_exec_with_metrics) as num_exec, coord_stmt_exec_time, 
       decimal(coord_stmt_exec_time / double(num_exec_with_metrics), 10, 2) as avg_coord_exec_time, 
       decimal( (coord_stmt_exec_time / double(...
    
    MEMBER NUM_EXEC COORD_STMT_EXEC_TIME AVG_COORD_EXEC_TIME PCT_COORD_STMT_EXEC_TIME TOTAL_CPU_TIME ...
    ------ -------- -------------------- ------------------- ------------------------ -------------- ...
         0    61729               301979                4.89                    15.09       29928189 ...
         3    69026               150782                2.18                     7.53       33255990 ...
         2    72832               148618                2.04                     7.42       34689087 ...
         1    64562               137670                2.13                     6.88       3121S907 ...
         0   124839               112563                0.90                     5.62        4468845 ...
         0   124833               111863                0.89                     5.59        3686860 ...
         0   124836                58906                0.46                     2.91        2807094 ...
         0    11350                57011                5.02                     2.84        1997313 ...
         2   147141                46613                0.31                     2.33        4830165 ...
         3   138286                45568                0.32                     2.27        4618735 ...
         3   138285                40548                0.29                     2.02        5767918 ...
         2   147137                39975                0.27                     1.99        6031554 ...
         1   131525                39818                0.30                     1.99        4457211 ...
    Output from report (continued):
    ... AVG_CPU_TIME PCT_WAIT_TIME AVG_SECT_TIME AVG_COL_TIME STMT_TEXT                                     
    ... ------------ ------------- ------------- ------------ ----------------------------------------------
    ...          484         53.67          4.89         0.00 Se1ect Count(Distinct S_I_ID) from STOCK, ORDE
    ...          481         40.93          2.18         0.00 Se1ect Count(Distinct S_I_ID) from STOCK, ORDE
    ...          476         36.56          2.04         0.00 Se1ect Count(Distinct S_I_ID) from STOCK, ORDE
    ...          483         41.59          2.13         0.00 Se1ect Count(Distinct S_I_ID) from STOCK, ORDE
    ...           35         89.65          0.90         0.00 Update STOCK set S_QUANTITY = ?, S_YTD = ?, S_
    ...           29         90.54          0.89         0.00 Se1ect S_QUANTITY, S_DIST_01, S_DIST_02, S_DIS
    ...           22         85.45          0.46         0.00 Insert into ORDER_LINE va1ues (?, ?, ?, ?, ?, 
    ...          175         93.18          5.02         0.00 Update ORDER_LINE set OL_DELIVERY_D = ? where 
    ...           32         84.75          0.31         0.00 Se1ect S_QUANTITY, S_DIST_01, S_DIST_02, S_DIS
    ...           33         85.28          0.32         0.00 Se1ect S_QUANTITY, S_DIST_01, S_DIST_02, S_DIS
    ...           41         83.84          0.29         0.00 Update STOCK set S_QUANTITY = ?, S_YTD = ?, S_
    ...           40         82.78          0.27         0.00 Update STOCK set S_QUANTITY = ?, S_YTD = ?, S_
    ...           33         86.20          0.30         0.00 Se1ect S_QUANTITY, S_DIST_01, S_DIST_02, S_DIS
      13 record(s) selected.     

    The "Wait time breakdown" section shows statements that are not running because they are waiting for resources, such as (but not limited to) disk, latches, and locks.

    ==============================================================
     Wait time breakdown for top SQL statements by execution time
    ==============================================================
    
    select member, decimal((total_act_wait_time / double(total_act_time)) * 100, 5, 2) as pct_wait, 
       decimal((log_disk_wait_time / double(total_act_time)) * 100, 5, 2) as pct_lg_dsk, 
       decimal((log_buffer_wait...
    
    MEMBER PCT_WAIT PCT_LG_DSK PCT_LG_BUF PCT_LOCK PCT_GL_LK PCT_LTCH PCT_RCLM PCT_CF  PCT_PFTCH PCT_DIAG ...
    ------ -------- ---------- ---------- -------- --------- -------- -------- ------- --------- -------- ...
        0     53.67       0.00       0.00     2.37      0.36     2.34     0.00   47.55      0.00     0.00 ...
        3     40.93       0.00       0.00     2.29      1.02     2.57     0.00   30.97      0.00     0.00 ...
        2     36.56       0.00       0.00     1.42      0.65     1.88     0.00   29.37      0.00     0.00 ...
        1     41.59       0.00       0.00     5.04      3.21     1.05     0.00   31.96      0.00     0.00 ...
        0     89.65       0.16       0.00     0.00      0.00     0.07     0.35   89.06      0.00     0.00 ...
        0     90.54       0.00       0.00     2.32      2.29     0.00     3.59   75.30      0.00     0.00 ...
        0     85.45       4.61       0.00     0.00      0.00     0.71     3.77   75.19      0.00     0.00 ...
        0     93.18       0.00       0.00     0.00      0.00     0.02     0.05   88.41      0.00     0.00 ...
        2     84.75       0.00       0.00     5.16      5.12     0.00     4.94   63.97      0.00     0.00 ...
        3     85.28       0.00       0.00     0.23      0.19     0.00     7.03   67.87      0.00     0.00 ...
        3     83.84       0.19       0.00     0.00      0.00     0.01     1.50   82.12      0.00     0.00 ...
        2     82.78       0.20       0.00     0.00      0.00     0.01     0.71   81.85      0.00     0.00 ...
        1     86.20       0.00       0.00     0.00      0.33     0.00     4.85   70.23      0.00     0.00 ...
        1     83.34       0.22       0.00     0.00      0.00     0.00     0.90   82.20      0.00     0.00 ...
    Output from report (continued):
    ... PCT_POOL_RD PCT_DIR_WR PCT_DIR_RD STMT_TEXT                                                        
    ... ----------- ---------- ---------- -----------------------------------------------------------------
    ...        1.40       0.00       0.00 Se1ect Count(Distinct S_I_ID) from STOCK, ORDER_LINE where (S_W_I
    ...        5.06       0.00       0.00 Se1ect Count(Distinct S_I_ID) from STOCK, ORDER_LINE where (S_W_I
    ...        3.87       0.00       0.00 Se1ect Count(Distinct S_I_ID) from STOCK, ORDER_LINE where (S_W_I
    ...        3.52       0.00       0.00 Se1ect Count(Distinct S_I_ID) from STOCK, ORDER_LINE where (S_W_I
    ...        0.00       0.00       0.00 Update STOCK set S_QUANTITY = ?, S_YTD = ?, S_ORDER_CNT = ?, S_RE
    ...        9.27       0.00       0.00 Se1ect S_QUANTITY, S_DIST_01, S_DIST_02, S_DIST_03, S_DIST_04, S_
    ...        0.49       0.00       0.00 Insert into ORDER_LINE va1ues (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)     
    ...        4.67       0.00       0.00 Update ORDER_LINE set OL_DELIVERY_D = ? where OL_W_ID = ? and OL_
    ...       10.32       0.00       0.00 Se1ect S_QUANTITY, S_DIST_01, S_DIST_02, S_DIST_03, S_DIST_04, S_
    ...       10.12       0.00       0.00 Se1ect S_QUANTITY, S_DIST_01, S_DIST_02, S_DIST_03, S_DIST_04, S_
    ...        0.00       0.00       0.00 Update STOCK set S_QUANTITY = ?, S_YTD = ?, S_ORDER_CNT = ?, S_RE
    ...        0.00       0.00       0.00 Update STOCK set S_QUANTITY = ?, S_YTD = ?, S_ORDER_CNT = ?, S_RE
    ...       10.76       0.00       0.00 Se1ect S_QUANTITY, S_DIST_01, S_DIST_02, S_DIST_03, S_DIST_04, S_
    ...        0.00       0.00       0.00 Update STOCK set S_QUANTITY = ?, S_YTD = ?, S_ORDER_CNT = ?, S_RE
    
      14 record(s) selected.
    Tip: The "Statement and Plan Identifiers" section shows the EXECUTABLE_ID for each of the top 100 statements. You can use the EXECUTABLE_ID with the MON_GET_PKG_CACHE_STMT function to get the access plan for any of these statements. You can also use EXECUTABLE_ID to determine the explain plan, by using the following command:
    db2 "call explain_from_section(x'<executable id>','M',NULL,0,'<current user>',?,?,?,?,?)"

    The "Top SQL statements by execution time, aggregated by PLANID" section is similar to the "Top SQL statement by execution time" section. All statements that differ by literal values (and have the same PLANID hash value) are aggregated to show the total costs. This information is useful when a transactional application is driving lots of SQL statements with literal values.

    The "IO statistics per statement" section shows the buffer pool activity for the top 100 SQL statements.

    ==========================================================
     IO statistics per stmt - top statements by execution time
    ==========================================================
    
    select member, integer(num_exec_with_metrics) num_exec, 
       decimal(pool_data_l_reads/double(num_exec_with_metrics), 16,1) as avg_d_lrd, 
       decimal(pool_data_p_reads/double(num_exec_with_metrics), 10,1) ...
    
    MEMBER NUM_EXEC    AVG_D_LRD  AVG_D_PRD  AVG_I_LRD  AVG_I_PRD  AVG_TD_PRD  AVG_TI_PRD  ...   
    ------ ----------- ---------- ---------- ---------- ---------- ----------- ----------- ...   
         0       61729        0.0        0.0      607.4        0.1         0.0         0.0 ...   
         3       69026        0.0        0.0      607.4        0.4         0.0         0.0 ...   
         2       72832        0.0        0.0      607.3        0.2         0.0         0.0 ...   
         1       64562        0.0        0.0      607.3        0.2         0.0         0.0 ...   
         0      124839        2.0        0.0        4.0        0.0         0.0         0.0 ...   
         0      124833        1.1        0.1        3.0        0.0         0.0         0.0 ...   
         0      124836        1.0        0.0        4.1        0.0         0.0         0.0 ...   
         0       11350       20.3        0.4        4.1        0.0         0.0         0.0 ...   
         2      147141        1.1        0.1        3.0        0.0         0.0         0.0 ...   
         3      138286        1.1        0.1        3.0        0.0         0.0         0.0 ...   
         3      138285        2.0        0.0        4.0        0.0         0.0         0.0 ...   
         2      147137        2.0        0.0        4.0        0.0         0.0         0.0 ...   
         1      131525        1.1        0.1        3.0        0.0         0.0         0.0 ...   
         1      131523        2.0        0.0        4.0        0.0         0.0         0.0 ...   
         0      124837        0.0        0.0        2.0        0.0         0.0         0.0 ...   
         0       11343        0.0        0.0        4.1        0.0         0.0         0.0 ...   
         2      147135        1.0        0.0        4.1        0.0         0.0         0.0 ...   
         3      138285        1.0        0.0        4.1        0.0         0.0         0.0 ...   
         1      131522        1.0        0.0        4.1        0.0         0.0         0.0 ...   
         0       12533        1.0        0.0        6.0        0.0         0.0         0.0 ...   
         2       13568       20.4        0.4        4.1        0.0         0.0         0.0 ...
    Output from report (continued):
    ... AVG_COL_LRD  AVG_COL_PRD  AVG_DIR_R_RQS AVG_DIR_W_RQS STMT_TEXT                  
    ... ------------ ------------ ------------- ------------- -------------------------- 
    ...          0.0          0.0           0.0           0.0 Se1ect Count(Distinct S_I_]
    ...          0.0          0.0           0.0           0.0 Se1ect Count(Distinct S_I_]
    ...          0.0          0.0           0.0           0.0 Se1ect Count(Distinct S_I_]
    ...          0.0          0.0           0.0           0.0 Se1ect Count(Distinct S_I_]
    ...          0.0          0.0           0.0           0.0 Update STOCK set S_QUANTITY
    ...          0.0          0.0           0.0           0.0 Se1ect S_QUANTITY, S_DIST_0
    ...          0.0          0.0           0.0           0.0 Insert into ORDER_LINE valu
    ...          0.0          0.0           0.0           0.0 Update ORDER_LINE set OL_DE
    ...          0.0          0.0           0.0           0.0 Se1ect S_QUANTITY, S_DIST_0
    ...          0.0          0.0           0.0           0.0 Se1ect S_QUANTITY, S_DIST_0
    ...          0.0          0.0           0.0           0.0 Update STOCK set S_QUANTITY
    ...          0.0          0.0           0.0           0.0 Update STOCK set S_QUANTITY
    ...          0.0          0.0           0.0           0.0 Se1ect S_QUANTITY, S_DIST_0
    ...          0.0          0.0           0.0           0.0 Update STOCK set S_QUANTITY
    ...          0.0          0.0           0.0           0.0 Se1ect I_NAME, I_PRICE, I_D
    ...          0.0          0.0           0.0           0.0 Se1ect sum(OL_AMOUNT) from 
    ...          0.0          0.0           0.0           0.0 Insert into ORDER_LINE valu
    ...          0.0          0.0           0.0           0.0 Insert into ORDER_LINE valu
    ...          0.0          0.0           0.0           0.0 Insert into ORDER_LINE valu
    ...          0.0          0.0           0.0           0.0 Insert into ORDERS va1ues (
    ...          0.0          0.0           0.0           0.0 Update ORDER_LINE set OL_DE
    
      21 record(s) selected.

    I/O information can indicate whether a statement is using an index. AVG_I_LRD stands for average index logical reads per execution. AVG_D_LRD stands for average data logical reads. In the previous output, the first four entries have high AVG_I_LRD values and AVG_D_LRD values of zero. This combination of values is indicative of an index-based plan. If a statement has high data logical reads and low index logical reads, changes to the indexes might improve performance.

  5. Use the "Database log write times" section to examine log disk performance.

    Statements whose execution times are relatively higher than the other statements might suggest a poorly optimized access plan. If you add an index and it does not provide a solution, compare the highest wait times from the database and statement levels. This information identifies the largest contributors to wait time, then consults the appropriate report sections to provide more detailed information.

    In db2mon, the individual wait time percentages (such as log disk wait time) are calculated as a percent of total request time, not as a percent of total wait time. Total wait time (where everything adds up to 100% wait time) can be calculated by using monreport.dbsummary procedure.

    A high percent of log disk wait time, which is reported in the PCT_LG_DSK column in the "Wait times at database level" section (from Example 3) is common for transactional applications.

    ========================== 
     Database log write times  
    ========================== 
    
    select member, num_log_write_io, 
       case when ts_delta > 0 then decimal( double(num_log_write_io) / ts_delta, 10, 4 ) 
       else null end as ... 
       
    MEMBER NUM_LOG_WRITE_IO LOG_WRITE_IO_PER_S LOG_WRITE_MB_PER_S ...
    ------ ---------------- ------------------ ------------------ ...
         0            57237          1788.6562             7.0000 ...
         1            68286          2133.9375             8.3437 ...
         2            69373          2167.9062             8.5312 ...
         3            69286          2165.1875             8.4687 ...
    Output from report (continued):
    ... LOG_WRITE_TIME LOG_WRITE_TIME_PER_IO_MS NUM_LOG_BUFFER_FULL 
    ... -------------- ------------------------ --------------------
    ...          25306                   0.4421                    0
    ...          26345                   0.3858                    0
    ...          26669                   0.3894                    0
    ...          26741                   0.3859                    0
    
      4 record(s) selected.

    In the previous output, the log write time per I/O (LOG_WRITE_TIME_PER_IO_MS) is approximately 0.4 milliseconds, which is optimal. Values vary between systems, but log write percentage times above 4.0 milliseconds indicate that changes to storage configuration might be required. Reconfiguration includes changes to cache, RAID, and number of LUNs per file system.

  6. Use the "Bufferpool read statistics" section and the "Disk read and write I/O times" section to examine read times.
    Note: The output in the "Bufferpool read statistics" section includes the following abbreviations:
    • POOL_DATA_L_READS: Buffer pool data page logical reads (base table ORGANIZE BY ROW access)
    • POOL_DATA_P_READS: Buffer pool data page physical reads
    • POOL_INDEX_L_READS: Buffer pool index page logical reads (index access)
    • POOL_INDEX_P_READS: Buffer pool index page physical reads
    • POOL_COL_L_READS: Buffer pool column-organized data page logical reads (base table ORGANIZE BY COLUMN access)
    • POOL_COL_P_READS: Buffer pool column-organized data page physical reads
    ============================ 
     Bufferpool read statistics  
    ============================ 
    
    select member, substr(bp_name,1,20) as bp_name, pool_data_l_reads, pool_data_p_reads, 
       pool_index_l_reads, pool_index_p_reads, pool_col_l_reads, pool_col_p_reads, 
       pool_read_time, case when ...
    
    MEMBER BP_NAME       POOL_DATA_L_READS  POOL_DATA_P_READS  POOL_INDEX_L_READS  ...
    ------ ------------- ------------------ ------------------ ------------------- ...
         2 IBMDEFAULTBP             3037899              44159            45043859 ...
         3 IBMDEFAULTBP             2841993              41020            42586533 ...
         1 IBMDEFAULTBP             2696917              38268            40018451 ...
         0 IBMDEFAULTBP             2518978              38610            37512769 ...
    Output from report (continued):
    ... POOL_INDEX_P_READS  POOL_COL_L_READS  POOL_COL_P_READS  POOL_READ_TIME AVG_READ_TIME
    ... ------------------- ----------------- ----------------- -------------- -------------
    ...               27427                 0                 0          19619          0.27
    ...               34728                 0                 0          21063          0.27
    ...               23941                 0                 0          17030          0.27
    ...               10298                 0                 0          28304          0.57
         
      4 record(s) selected.

    High pool read times is a common issue. High pool read times are reported in the PCT_POOL_RD column in the "Wait time breakdown for top SQL statements by execution time" section (from Example 4). This issue is especially common when the system is reading many pages from disk into buffer pool. This type of heavy page read activity typically occurs shortly after database activation, or when a new application starts to make connections. Average pool read times (AVG_READ_TIME) are reported at the buffer pool level and the table space level. Similarly, a high percent direct I/O time for large objects can be tracked in AVG_DRCT_READ_TIME and AVG_DRCT_WRITE_TIME.

    =============================== 
     Disk read and write I/O times  
    =============================== 
    
    select member, substr(tbsp_name,1,20) as tbsp_name, 
       (pool_data_p_reads + pool_index_p_reads+ pool_col_p_reads) as num_reads, 
       case when ((pool_data_p_reads + pool_index_p_reads+ pool ...
    
    MEMBER TBSP_NAME  NUM_READS  AVG_READ_TIME DIRECT_READ_REQS AVG_DRCT_READ_TIME ...
    ------ ---------- ---------- ------------- ---------------- ------------------ ...
         3 TBS_OM          29235          0.27                0                  - ...
         2 TBS_OM          22195          0.26                0                  - ...
         2 TBS_CST         21312          0.27                0                  - ...
         3 TBS_CST         19700          0.28                0                  - ...
         1 TBS_OM          19201          0.26                0                  - ...
         1 TBS_CST         18485          0.27                0                  - ...
         0 TBS_CST         17705          0.56                0                  - ...
         2 TBS_STK         16568          0.27                0                  - ...
         0 TBS_STK         16127          0.59                0                  - ...
         3 TBS_STK         15559          0.27                0                  - ...
    Output from report (continued):
    ... NUM_WRITES  AVG_WRITE_TIME DIRECT_WRITE_REQS  AVG_DRCT_WRITE_TIME
    ... ----------- -------------- ------------------ -------------------
    ...           0              -                  0                   -
    ...           0              -                  0                   -
    ...           0              -                  0                   -
    ...           0              -                  0                   -
    ...           0              -                  0                   -
    ...           0              -                  0                   -
    ...           0              -                  0                   -
    ...           0              -                  0                   -
    ...           0              -                  0                   -
    ...           0              -                  0                   -
    
      10 record(s) selected.
  7. Use the "Round-trip CF" section to determine round-trip time for messages between members and CFs.

    Percentage CF wait time is reported in PCT_CF in "Wait times at database level" (Example 3) and "Wait time breakdown" (Example 4). If the PCT_CF values in your report are high, the "Round-trip CF" section shows performance data on messages that are exchanged between the Db2 member and the CFs.

    =================================================================== 
     Round-trip CF command execution counts and average response times  
    =================================================================== 
    
    select member, id, cast(substr(cf_cmd_name,1,30) as varchar(30)) as cf_cmd_name, 
       total_cf_requests, decimal( doub...
    
    MEMBER ID     CF_CMD_NAME           TOTAL_CF_REQUESTS  AVG_CF_REQUEST_TIME_MICRO
    ------ ------ --------------------- ------------------ -------------------------
         2    128 SetLockState                     1016554                     60.11
         3    128 SetLockState                      955295                     65.82
         1    128 SetLockState                      905310                     62.04
         0    128 SetLockState                      851689                    223.05
         3    128 ReadAndRegister                   682169                     88._8
         2    128 ReadAndRegister                   671832                     83.39
         1    128 ReadAndRegister                   662450                     84.51
         0    128 ReadAndRegister                   627922                    282.94
         2    128 SetLockStateMultiple              413551                     70.50
         3    128 SetLockStateMultiple              387168                     77.93
         1    128 SetLockStateMultiple              367427                     73.74
         0    128 SetLockStateMultiple              345954                    252.11
    
      12 record(s) selected.
  8. Use the "Page reclaim metrics" section to examine page reclaim metrics for index and data pages.

    Reclaim wait time is reported in PCT_RCLM in "Wait times at database level" (Example 3) and "Wait time breakdown" (Example 4). If PCT_RCLM values in your report are above zero, the following section shows which table has the most reclaim activity, and whether the table or an index on the table is reclaimed:

    =============================================== 
     Page reclaim metrics for index and data pages  
    =============================================== 
    select member, substr(tabschema,1,20) as tabschema, substr(tabname,1,40) as tabname, 
       substr(objtype,1,10) as objtype, data_partition_id, iid, (page ...
    
    MEMBER TABSCHEMA  TABNAME         OBJTYPE  DATA_PARTITION_ID IID  PAGE_RECLAIMS  RECLAIM_WAIT_TIME   
    ------ ---------- --------------- -------- ----------------- ---- -------------- ------------------
         0 DTW        ORDERS          INDEX                    -    -           4334               3357
         2 DTW        ORDER_LINE      INDEX                    -    -           3870               3096
         2 DTW        ORDERS          INDEX                    -    -           3652               2732
         3 DTW        ORDERS          INDEX                    -    -           3638               2724
         1 DTW        ORDER_LINE      INDEX                    -    -           3634               2666
         3 DTW        ORDER_LINE      INDEX                    -    -           3577               2391
         1 DTW        ORDERS          INDEX                    -    -           3502               2326
         0 DTW        ORDER_LINE      INDEX                    -    -           4169               2286
         0 DTW        STOCK_501_750   INDEX                    -    -            165                983
         0 DTW        STOCK_751_1000  INDEX                    -    -            199                952
         1 DTW        NEW_ORDER       INDEX                    -    -            441                880
    
      11 record(s) selected.

    The sample output shows highest reclaim activity on indexes on ORDERS and ORDER_LINE across all the members. However, these RECLAIM_WAIT_TIME values are not high and this can be confirmed by examining the PCT_RCLM (percent reclaim wait time) in the "Wait times at database level" section. High reclaim activity of index pages is most common, and can be improved by using RANDOM indexes, CURRENT MEMBER partitioning, and range partitioning.

  9. Use the "Page reclaim metrics for SMP pages" section to identify reclaim activity of space map pages (SMPs). For details about space map page reclaim metrics, see spacemappage_reclaim_wait_time - Space map page reclaim wait time monitor element .

    The following output shows reclaimed SMP pages, which are often due to heavy inserts into a table whose table space extent size is too small.

    ==================================== 
     Page reclaim metrics for SMP pages  
    ==================================== 
    
    select member, substr(tabschema,1,20) as tabschema, substr(tabname,1,40) as tabname, 
       substr(objtype,1,10) as objtype, data_partition_id, iid, (spacemapp ...
    
    MEMBER TABSCHEMA TABNAME    OBJTYPE DATA_PARTITION_ID IID SMP_PAGE_RECLAIMS SMP_PAGE_RECLAIM_WAIT_TIME
    ------ --------- ---------- ------- ----------------- --- ----------------- --------------------------
         3 DTW       ORDER_LINE TABLE                   -   -                 9                        254
         2 DTW       ORDER_LINE INDEX                   -   -                57                        241
         0 DTW       ORDER_LINE INDEX                   -   -                90                        192
         0 DTW       ORDER_LINE TABLE                   -   -                 9                        170
         3 DTW       ORDER_LINE INDEX                   -   -                67                        169
         1 DTW       ORDER_LINE INDEX                   -   -                57                         76
         1 DTW       ORDER_LINE TABLE                   -   -                 3                         10
         2 DTW       ORDER_LINE TABLE                   -   -                 4                          3
         3 DTW       NEW_ORDER  TABLE                   -   -                 2                          3
         1 DTW       NEW_ORDER  TABLE                   -   -                 1                          2
    
      10 record(s) se1ected.
  10. Use the "Latch wait metrics" section to identify latch waits.

    Latch wait time percentages is reported in PCT_LTCH in "Wait times at database level" (Example 3) and "Wait time breakdown" (Example 4). Latch wait time percentages (PCT_LTCH) higher than 15% to 20% is considered high. The following section shows details for latch waits by type:

    ==================== 
     Latch wait metrics  
    ==================== 
    
    select member, substr(latch_name,1,60) as latch_name, 
       total_extended_latch_wait_time as tot_ext_latch_wait_time_ms, 
       total_extended_latch_waits ...
    
    MEMBER LATCH_NAME                            TOT_EXT_LATCH_WAIT_TIME_MS ...
    ------ ------------------------------------- -------------------------- ...
         0 SQLO_LT_SQLB_BPD_bpdLatch_SX                                7182 ...
         3 SQLO_LT_SQLB_BPD_bpdLatch_SX                                4085 ...
         2 SQLO_LT_SQLB_BPD_bpdLatch_SX                                3086 ...
         1 SQLO_LT_SQLB_BPD_bpdLatch_SX                                1764 ...
         0 SQLO_LT_SQLB_BPD_WARLatch                                   1272 ...
         2 SQLO_LT_SQLB_BPD_WARLatch                                   1168 ...
         3 SQLO_LT_SQLB_BPD_WARLatch                                    706 ...
         1 SQLO_LT_SQLB_BPD_WARLatch                                    509 ...
         0 SQLO_LT_SQLP_LHSH_hshlatch                                   263 ...
         3 SQLO_LT_SQLI_INX_PAGE_CACHE_ipcLatch                         240 ...
         2 SQLO_LT_SQLI_INX_PAGE_CACHE_ipcLatch                         221 ...
         1 SQLO_LT_SQLI_INX_PAGE_CACHE_ipcLatch                         193 ...
         0 SQLO_LT_SQLI_INX_PAGE_CACHE_ipcLatch                         175 ...
    Output from report (continued):
    ... TOT_EXT_LATCH_WAITS  TIME_PER_LATCH_WAIT_MS       
    ... -------------------- ----------------------  
    ...                 4454                   1.61  
    ...                 1965                   2.07  
    ...                 2088                   1.47  
    ...                 1724                   1.02  
    ...                 1783                   0.71  
    ...                 2637                   0.44  
    ...                 2110                   0.33  
    ...                 1870                   0.27  
    ...                  758                   0.34  
    ...                   36                   6.66  
    ...                   45                   4.91  
    ...                   21                   9.19  
    ...                   25                   7.00  
    
      13 record(s) selected.    

    Most values for TIME_PER_LATCH_WAIT_MS are well below a few seconds, measured in a 30-second period, across all agents that operate on the system. Therefore, this system shows no significant latching issue.

    High latch wait time percentages can result from various causes. Use the following techniques to address scenarios with more significant latch wait:
    • In Db2 pureScale, a strong correlation exists between latch wait and page reclaim time. Even 2% to 5% of reclaim wait time can end up causing 20% to 40% of latch wait time in some scenarios. Examining the "Page reclaim metrics for index and data pages" section (Example 8) might indicate that the table or index needs altering. This is especially true if both latch wait times and reclaim wait times are high on the same SQL statements.
    • Significant improvements in latching were made in Db2 Version 11.1. If a latch with the name SQLO_LT_SQLB_HASH_BUCKET_GROUP_HEADER__groupLatch has the highest latch wait time in the "Latch wait metrics" section, then upgrading to Version 11 might reduce the time spent to hold the latch.
    • The SQLB_BPD__bpdLatch type is associated with a processing-intensive buffer pool page. Identify the table that is involved by looking at statements with high latch wait time. One approach to reduce latch contention is to range partition the table and use partitioned indexes.