Example: Analyzing a service class-related system slowdown

If you notice a system slowdown (for example, some applications take much longer than expected to complete) and are unsure whether the problem is related to the configuration of the service classes, you can use table function data to investigate and, if necessary, correct the problem.

First, obtain a high-level overview of what is occurring in the service classes. This high-level overview should include the average activity lifetime, the number of activities that completed normally rather than abnormally,and the high watermark for concurrent coordinator activities in the system. To obtain this information, you can create a general query with aggregation across service classes and database members by using the data obtained from the table function WLM_GET_SERVICE_SUBCLASS_STATS. Set the first and second arguments to empty strings and the third argument to -2 (a wildcard character) to indicate that data is to be gathered for all service classes on all database members. Your query might resemble the following one:
SELECT SUBSTR(SERVICE_SUPERCLASS_NAME,1,19) AS SUPERCLASS_NAME,
       SUBSTR(SERVICE_SUBCLASS_NAME,1,18) AS SUBCLASS_NAME,
       SUBSTR(CHAR(SUM(COORD_ACT_COMPLETED_TOTAL)),1,13) AS ACTSCOMPLETED,
       SUBSTR(CHAR(SUM(COORD_ACT_ABORTED_TOTAL)),1,11) AS ACTSABORTED,
       SUBSTR(CHAR(MAX(CONCURRENT_ACT_TOP)),1,6) AS ACTSHW,
       CAST(CASE WHEN SUM(COORD_ACT_COMPLETED_TOTAL) = 0 THEN 0
                 ELSE SUM(COORD_ACT_COMPLETED_TOTAL * COORD_ACT_LIFETIME_AVG)
                 / SUM(COORD_ACT_COMPLETED_TOTAL) END / 1000 AS DECIMAL(9,3))
       AS ACTAVGLIFETIME
FROM TABLE(WLM_GET_SERVICE_SUBCLASS_STATS ('', '', -2)) AS SCSTATS
GROUP BY SERVICE_SUPERCLASS_NAME, SERVICE_SUBCLASS_NAME
ORDER BY SERVICE_SUPERCLASS_NAME, SERVICE_SUBCLASS_NAME
Assume that on previous occasions, the query reported the following results:
SUPERCLASS_NAME     SUBCLASS_NAME      ACTSCOMPLETED ACTSABORTED ACTSHW ACTAVGLIFETIME
------------------- ------------------ ------------- ----------- ------ --------------
SYSDEFAULTUSERCLASS SYSDEFAULTSUBCLASS 8             0           1               3.750
BI_APPS             SYSDEFAULTSUBCLASS 4             0           1               5.230
BATCH               SYSDEFAULTSUBCLASS 1             0           1              25.600

The data returned by this query might be sufficient to show that the slowdown is occurring in the BI_APPS service class because its average activity lifetime is significantly higher than usual. This situation could indicate that the available resources for that particular service class are becoming exhausted.

If the averages for the service classes for all database members do not isolate the problem, consider analyzing average values for each member. Aggregating the average for each member into a global average can hide large discrepancies between database members. In this situation, the assumption is that every member is being used as a coordinator member. If this assumption is incorrect, the average lifetime computed at non-coordinator members is zero.
SELECT SUBSTR(SERVICE_SUPERCLASS_NAME,1,19) AS SUPERCLASS_NAME,
       SUBSTR(SERVICE_SUBCLASS_NAME,1,18) AS SUBCLASS_NAME,
       SUBSTR(CHAR(MEMBER),1,4) AS MEM,
       CAST(COORD_ACT_LIFETIME_AVG / 1000 AS DECIMAL(9,3)) AS AVGLIFETIME
FROM TABLE(WLM_GET_SERVICE_SUBCLASS_STATS('', '', -2)) AS SCSTATS
ORDER BY SUPERCLASS_NAME, SUBCLASS_NAME

SUPERCLASS_NAME     SUBCLASS_NAME      MEMB AVGLIFETIME
------------------- ------------------ ---- -----------
SYSDEFAULTUSERCLASS SYSDEFAULTSUBCLASS 0          3.425
SYSDEFAULTUSERCLASS SYSDEFAULTSUBCLASS 1          2.752
SYSDEFAULTUSERCLASS SYSDEFAULTSUBCLASS 2          8.230
SYSDEFAULTUSERCLASS SYSDEFAULTSUBCLASS 3          0.593

In this example, member 2 might be receiving more work than usual because its average activity lifetimes are much higher than those of the other database members.

Many different situations can cause a system slowdown. Use the following principles to make the best use of the information provided by the Db2® workload management table functions:
  • Address large numbers of locking conflicts at the level of the application logic and environment (isolation level and so on).
  • If the service class is running close to its threshold levels (the number of concurrent requests and so on), you might need to increase the thresholds.
  • If the resources allotted to a service class are becoming exhausted and OUTBOUT CORRELATOR is set, the mapping to the operating system service classes might be the cause of the problem (that is, the operating system service class corresponding to the service class is not getting enough processor resources).
  • Higher numbers of activities than expected might be running in the service class, which might be consuming more resources than normal. Check the number of completed activities to determine whether the amount of work being done in the service class is reasonable.
  • Activities might be spending more time in queues if more activities are being submitted than expected and concurrency thresholds are defined. Check whether the average queue time for activities has increased by the same amount as the average lifetime. If they have increased by the same amount, the queues are behaving as expected; however, if the lifetime is unacceptable, consider allocating more resources to the service class and reducing the concurrency threshold.