Exercise 3: Using thresholds to control rogue activities and using the threshold violation monitor
This exercise demonstrates how you can use thresholds to detect resource misuse or the beginning of system overload by establishing limits over the consumption of a specific resource.
Estimated time: 15-20 minutes
If a threshold is violated, a specified action can be triggered. The supported actions are:
- STOP EXECUTION: Stop processing the activity that caused the threshold to be violated.
- CONTINUE: Continue processing
- Collect information about the activity that violated the threshold. This action can be specified in conjunction with the CONTINUE or STOP EXECUTION action.
Regardless of whether an activity that violates a threshold is stopped or permitted to continue running, a record of the violation is written to an active THRESHOLD VIOLATIONS event monitor (assuming one is defined in advance) each time a threshold is violated. The record contains information such as which threshold was violated, the time of the violation, and the threshold action.
This exercise demonstrates how thresholds can be used to detect or prevent rogue activities from running on your system and using up system resources. A rogue activity is any activity that uses an unexpectedly high amount of resources. For example, a query that runs for an abnormally long time, or returns an unexpectedly large result set.
Step 1: Create a threshold violation event monitor
Create and enable a write-to-table event monitor that will be used to capture the threshold violation information and enable the activity event monitor that was created in Exercise 1.
CREATE EVENT MONITOR threvio FOR THRESHOLD VIOLATIONS WRITE TO TABLE
THRESHOLDVIOLATIONS(IN userspace1),
CONTROL(IN userspace1)
SET EVENT MONITOR threvio STATE 1
SET EVENT MONITOR db2activities STATE 1
Step 2: Create a workload
Create a workload such that all activities run from the workth.db2 script will get mapped to the work1_sc service class.
The work1_sc service class already exists since it was created in Exercise 2.
CREATE WORKLOAD workth_wl
CURRENT CLIENT_APPLNAME('CLP workth.db2')
SERVICE CLASS work1_sc
Step 3: Create thresholds
Create two thresholds, one of which (th_estcost) is an ESTIMATEDSQLCOST threshold and another (th_sqlrows) is a SQLROWSRETURNED threshold and apply them to the service class you wish to control the activities for (in this case, work1_sc service class).
The th_estcost threshold specifies an upper bound (10000 timerons) for the optimizer-estimated cost (in timerons) for an activity running in the work1_sc service class. If any query with an estimated cost greater than 10000 timerons, tries to execute in the work1_sc service class, this threshold is violated and the query is not permitted to run.
The th_sqlrows threshold specifies that any activity running in the work1_sc service class can return at most 30 rows from the data server. If any query tries to return more than 30 rows, this threshold is violated, only 30 rows will be returned to the client and the query will be stopped. In addition, data about the activity that caused the threshold violation will be collected.
In either case, when an activity violates the threshold, a threshold violation record is written to the THRESHOLD VIOLATIONS event monitor as defined in step 1 and the execution of the activity is stopped (because of the STOP EXECUTION action). The application that submitted the activity will receive an SQL4712N error.
CREATE THRESHOLD th_estcost
FOR SERVICE CLASS work1_sc ACTIVITIES
ENFORCEMENT DATABASE
WHEN ESTIMATEDSQLCOST > 10000
STOP EXECUTION
CREATE THRESHOLD th_sqlrows
FOR SERVICE CLASS work1_sc ACTIVITIES
ENFORCEMENT DATABASE
WHEN SQLROWSRETURNED > 30
COLLECT ACTIVITY DATA WITH DETAILS AND VALUES
STOP EXECUTION
Additional information: A threshold can be either predictive or reactive:
- Predictive threshold: The boundaries of a predictive threshold are checked before the activity starts running. To check whether a predictive threshold would be violated, the data server obtains usage estimates from the query compiler. For this example, the th_estcost threshold is a predictive threshold.
- Reactive threshold: The boundaries of a reactive threshold are checked while an activity is executing. Approximate runtime usage estimates of the controlled resource are used to evaluate the boundaries of reactive thresholds. The runtime usage estimates are not obtained continuously but rather at selected predefined checkpoints during the lifetime of the tracked work. For this example, the th_sqlrows is a reactive threshold.
Step 4: Run some activities
Run some activities, some of which violate the threshold upper bounds defined in the previous step.
db2 -o -tvf workth.db2
Note that the statements which violate the thresholds defined previously, fail with an error of SQL4712N/SQLSTATE 5U026.
Step 5: View the threshold violation event monitor
Information about every threshold violation is collected by the THRESHOLD VIOLATIONS event monitor. You can query the threshold violation information by issuing regular SQL statements against the threshold violation monitor table as shown in the following example.
CONNECT TO SAMPLE
SELECT APPL_ID,
UOW_ID,
ACTIVITY_ID,
COORD_PARTITION_NUM AS COORDPART,
THRESHOLD_PREDICATE,
THRESHOLD_ACTION,
TIME_OF_VIOLATION
FROM THRESHOLDVIOLATIONS_THREVIO
ORDER BY THRESHOLD_ACTION, THRESHOLD_PREDICATE, TIME_OF_VIOLATION
The output will look something such as the following:
APPL_ID UOW_ID
ACTIVITY_ID COORDPART THRESHOLD_PREDICATE
THRESHOLD_ACTION TIME_OF_VIOLATION
--------------------------------------------------------------------------- ----
---------------- ----------- -------------------------
------------------------------------------------------- ------------------------
--
*LOCAL.DB2.070821150008 11
1 0 EstimatedSQLCost
Stop 2007-08-21-
11.00.11.000000
*LOCAL.DB2.070821150008 10
1 0 SQLRowsReturned
Stop 2007-08-21-
11.00.10.000000
2 record(s) selected.
Step 6: View information for the activity that violated the threshold
Activity information is collected for any activity that violates a threshold that is defined with a COLLECT clause. Show the detailed information about the activities that violated a threshold using the following query:
SELECT VARCHAR(A.APPL_NAME, 15) as APPL_NAME,
VARCHAR(A.TPMON_CLIENT_APP, 20) AS CLIENT_APP_NAME,
A.ACTIVITY_ID,
A.ACTIVITY_TYPE,
A.WORKLOAD_ID,
T.THRESHOLD_PREDICATE,
A.QUERY_CARD_ESTIMATE,
T.THRESHOLD_MAXVALUE,
T.TIME_OF_VIOLATION,
VARCHAR(AS.STMT_TEXT, 100) AS STMT_TEXT
FROM THRESHOLDVIOLATIONS_THREVIO AS T,
ACTIVITY_DB2ACTIVITIES AS A,
ACTIVITYSTMT_DB2ACTIVITIES AS AS
WHERE T.APPL_ID = A.APPL_ID AND
T.UOW_ID = A.UOW_ID AND
T.ACTIVITY_ID = A.ACTIVITY_ID AND
A.APPL_ID = AS.APPL_ID AND
A.ACTIVITY_ID = AS.ACTIVITY_ID AND
A.UOW_ID = AS.UOW_ID
The output will look something such as the following:
APPL_NAME CLIENT_APP_NAME ACTIVITY_ID ACTIVITY_TYPE
WORKLOAD_ID THRESHOLD_PREDICATE
QUERY_CARD_ESTIMATE THRESHOLD_MAXVALUE
TIME_OF_VIOLATION STMT_TEXT
--------------- -------------------- -------------------- ----------------------
------------------------------------------ ----------- -------------------------
--------------------------------------- -------------------- -------------------
- -------------------------- ---------------------------------------------------
-------------------------------------------------
db2bp CLP workth.db2 3 READ_DML
3 SQLRowsReturned
41 3
0 2007-08-31-09.01.16.000000 SELECT * FROM SALES
Note that the activity that violated the th_estcost (EstimatedSqlCost) threshold is not shown. The reason is that the threshold did not specify the COLLECT ACTIVITY DATA clause, so that no activity data was collected for that activity.
Step 7: Reset for next exercise
Disable the event monitors that were enabled. Also disable and drop the th_estcost and th_sqlrows thresholds that were created.
SET EVENT MONITOR threvio STATE 0
SET EVENT MONITOR db2activities STATE 0
ALTER THRESHOLD th_estcost DISABLE
DROP THRESHOLD th_estcost
ALTER THRESHOLD th_sqlrows DISABLE
DROP THRESHOLD th_sqlrows
Also clean up the activities event monitor tables and the threshold violation table
DELETE from ACTIVITY_DB2ACTIVITIES
DELETE from ACTIVITYSTMT_DB2ACTIVITIES
DELETE from THRESHOLDVIOLATIONS_THREVIO
CALL WLM_COLLECT_STATS()