Workload management monitoring
A good workload management system helps to efficiently meet goals in the environment where work occurs. Workload management has three clearly defined domains:
- Identification of the work entering the data server.
- Management of the work when it is running.
- Monitoring to ensure that the data server is being used efficiently.
The primary purpose of WLM monitoring is to validate the health and efficiency of your system and the individual workloads running on it. Using table functions, you can access real-time operational data, such as a list of running workload occurrences and the activities running in a service class or average response times. Using event monitors, you can capture detailed activity information and aggregate activity statistics for historical analysis.
Looking at aggregate information reported by table functions should usually be the first step when you build a monitoring strategy. Aggregates give a good picture of overall data server activity and are also cheaper because you do not have to collect information on every activity in which you might be interested.
WLM monitoring table functions have names that begin with "WLM_" and can be used to obtain real-time monitoring data, such as information about work currently running on the system, statistics, and metrics for work performed on the system. This information can help you determine usage patterns and resource allocation and identify problem areas.
Compared to monitoring metrics table functions, WLM table functions generally provide computed value data that is more statistical in nature (such as averages, high watermarks, standard deviations, etc.) and provide a much more complex set of raw monitoring data.
Table 5 lists table functions used to monitor workload management features.
Table 5. WLM monitoring table functions
|Table function name||Description|
|WLM_GET_SERVICE_CLASS_WORKLOAD_OCCURRENCES||Returns a list of workload occurrences assigned to a service class, including information about the current state, connection and activity.|
|WLM_GET_WORKLOAD_OCCURRENCE_ACTIVITIES||Returns a list of current activities associated with a workload occurrence, including information about the current state and type of the activity and the time at which the activity started.|
|WLM_GET_SERVICE_CLASS_AGENTS||Returns a list of database agents associated with a service class or an application handle, including information about the state of the agent and the action the agent is performing.|
|WLM_GET_SERVICE_SUPERCLASS_STATS||Shows summary statistics at the service superclass level — namely, high-water marks for concurrent connections to determine peak workload activity.|
|WLM_GET_WORKLOAD_STATS||Shows summary statistics at the workload level, including high-water marks for concurrent workload occurrences and numbers of completed activities.|
|WLM_GET_WORK_ACTION_SET_STATS||Shows summary statistics at the work action set level — namely, the number of activities in each work class that had the corresponding work actions applied to them.|
|WLM_GET_QUEUE_STATS||Shows summary statistics for the queues used for thresholds. Statistics include the current and total numbers of queued activities and total time spent in a queue.|
To list the occurrences of workloads mapped to the default
user WLM service class SYSDEFAULTSUBCLASS, the following query uses
table function to identify the source of activities.
Listing 7. WLM_GET_SERVICE_CLASS_WORKLOAD_OCCURRENCES table function
SELECT APPLICATION_HANDLE, VARCHAR(WORKLOAD_NAME, 30) AS WORKLOAD, VARCHAR(SESSION_AUTH_ID, 16) AS SESSION_AUTH_ID, VARCHAR(APPLICATION_NAME, 10) AS APPL_NAME FROM TABLE(WLM_GET_SERVICE_CLASS_WORKLOAD_OCCURRENCES('SYSDEFAULTUSERCLASS', 'SYSDEFAULTSUBCLASS', -1)) AS T
Listing 8. Sample output of WLM_GET_SERVICE_CLASS_WORKLOAD_OCCURRENCES
APPLICATION_HANDLE WORKLOAD SESSION_AUTH_ID APPL_NAME -------------------- ------------------------------ ---------------- ---------- 7 SYSDEFAULTUSERWORKLOAD DB2INST1 db2bp 28 SYSDEFAULTUSERWORKLOAD DB2INST1 db2bp 2 record(s) selected.
To show the current activities across database members that were
created from a particular application's connection, the
table function can be used. You can use this information for a number
of purposes, including identifying activities that might be causing
problems on the data server. Using the sample output above, to learn
about activities associated with application handle 28 for example,
the following query can be issued.
Listing 9. WLM_GET_WORKLOAD_OCCURRENCES_ACTIVITIES table function
SELECT APPLICATION_HANDLE, LOCAL_START_TIME, UOW_ID, ACTIVITY_ID, ACTIVITY_TYPE FROM TABLE(WLM_GET_WORKLOAD_OCCURRENCE_ACTIVITIES(28, -1)) AS T
Listing 10. Sample output of WLM_GET_WORKLOAD_OCCURRENCE_ACTIVITIES
APPLICATION_HANDLE LOCAL_START_TIME UOW_ID ACTIVITY_ID ACTIVITY_TYPE -------------------- -------------------------- ----------- ----------- -------------- 28 2012-08-18-184.108.40.2065048 25 2 READ_DML 1 record(s) selected.
The output shows that the activity associated with application handle
28 involves reading operations (
SELECT statement). If more details
about the activity are required, you can use the unit of work and
activity IDs to dig down further using other WLM table monitoring
DB2 workload management uses event monitors to capture information that might be of use in the future or for historical analysis. Within the set of event monitors covered above, three event monitors focus on WLM monitoring:
- Activity event monitor — The ACTIVITIES event
monitor captures information about individual activities in a
service class, workload, or work class, or activities that violated
a threshold. Reported activity data can be used as input to tools,
db2advisto acquire design recommendations that can enhance performance.
- Threshold violation event monitor — The THRESHOLD VIOLATIONS event monitor captures information when a threshold is violated. It indicates what threshold was violated, the activity that caused the violation, and what action was taken when it occurred.
- Statistics event monitor — The STATISTICS event monitor serves as a low-overhead alternative to capturing detailed activity information by collecting aggregate data, such as the number of activities completed, average execution time, etc. Aggregate data includes histograms for a number of activity measurements including lifetime, queue time, execution time, and estimated cost.
For example, to determine if there are outliers (large, expensive queries, possibly due to poor optimization of the queries themselves) in the database workload, the statistics event monitor can be used to report the average estimated cost of queries that run on the system, as well as the maximum estimated cost recorded. If the difference in cost is substantial, further investigation may be advisable. To create and enable the event monitor, issue a statement like this:
CREATE EVENT MONITOR DB2STATISTICS FOR STATISTICS WRITE TO TABLE SET EVENT MONITOR DB2STATISTICS STATE 1
To enable activity data collection for the default service class, issue the following statement:
ALTER SERVICE CLASS SYSDEFAULTSUBCLASS UNDER SYSDEFAULTUSERCLASS COLLECT AGGREGATE ACTIVITY DATA EXTENDED
Now with the event monitor activated and metrics collection enabled,
the database workload should be left to run for some time so
results obtained from the event monitor can be accurately indicative
of the workload. At the end of this period, collection of statistics
from memory into the statistics event monitor can be prompted by
WLM_COLLECT_STATS collects and reset
statistics for WLM objects. All statistics tracked for service
classes, workloads, threshold queues, and work action sets are sent to
the active statistics event monitor (if one exists) and reset. If
there is no active statistics event monitor, the statistics are only
reset, but not collected.
Estimated cost statistics of queries are included with the different statistics written to the event monitor tables. To see them, query the service class statistics table SCSTATS_DB2STATISTICS.
Listing 11. Querying the service class statistics table SCSTATS_DB2STATISTICS
SELECT STATISTICS_TIMESTAMP, COORD_ACT_EST_COST_AVG, COST_ESTIMATE_TOP FROM SCSTATS_DB2STATISTICS WHERE SERVICE_SUPERCLASS_NAME = 'SYSDEFAULTUSERCLASS' AND SERVICE_SUBCLASS_NAME = 'SYSDEFAULTSUBCLASS'
Listing 12. Sample output from the SCSTATS_DB2STATISTICS table
STATISTICS_TIMESTAMP COORD_ACT_EST_COST_AVG COST_ESTIMATE_TOP -------------------------- ---------------------- -------------------- 2012-08-18-220.127.116.116302 26 101 1 record(s) selected.
The output shows that the maximum cost estimate is about four times the average cost, which indicates that queries running in this workload have comparable estimated costs. This is an indication that there wasn't significant outlier activity during that period.