How to generate email notifications for threshold violations

Using the method described here, you can generate email notifications when a Db2® workload manager (WLM) threshold is violated.

Before you begin

In order to implement this email notification approach, you must have Db2 Version 9.7 or higher installed. The SMTP support used here was provided since Db2 V9.7.

About this task

Upon completion of this task, email notifications are sent if WLM threshold violations occur during the 10 minutes since the threshold notification procedure was last run. The Db2 Administrative Task Scheduler is used to schedule the threshold notification procedure to run every 10 minutes in this example.

Procedure

  1. Update the smtp_server database configuration parameter by issuing the following command:
    UPDATE DB CONFIG USING SMTP_SERVER smtp_server_name
  2. Create a write-to-table event monitor for threshold violations and write violations to the TEST.THRESHOLDVIOLATIONS_T table by issuing the following statement:
    CREATE EVENT MONITOR T FOR THRESHOLD VIOLATIONS WRITE TO TABLE 
    THRESHOLDVIOLATIONS( TABLE TEST.THRESHOLDVIOLATIONS_T )
  3. Activate the write-to-table event monitor T for threshold violations by issuing the following statement:
    SET EVENT MONITOR T STATE 1
  4. Create a control table to track the last threshold for which an alert was generated by issuing the following statement:
    CREATE TABLE TEST.THRESHOLD_NOTIFY_CONTROL( LAST_NOTIFICATION TIMESTAMP )
  5. Create a stored threshold notification procedure to generate threshold violation messages. The following example procedure iterates over the threshold violations table and builds a report listing all threshold violations that have occurred since the last time the procedure was invoked. The report is emailed using the Db2 SMTP procedures.
    CREATE PROCEDURE TEST.NOTIFY_ON_THRESHOLD_VIOLATION()
      LANGUAGE SQL
      BEGIN
         DECLARE NEWEST_VIOLATION    TIMESTAMP;
         DECLARE LAST_VIOLATION_SEEN TIMESTAMP; 
         DECLARE NOT_FOUND           INTEGER DEFAULT 0; 
         DECLARE SENDER              VARCHAR(128);
         DECLARE RECIPIENTS          VARCHAR(128); 
         DECLARE MESSAGE             VARCHAR(8192); 
         DECLARE SUBJECT             VARCHAR(128); 
         DECLARE THRESHOLDID         BIGINT;
         DECLARE APPL_ID             VARCHAR(64); 
         DECLARE THRESHOLD_PREDICATE VARCHAR(64); 
         DECLARE TIME_OF_VIOLATION   TIMESTAMP; 
    
         DECLARE C1 CURSOR FOR SELECT MAX(TIME_OF_VIOLATION) FROM TEST.THRESHOLDVIOLATIONS_T; 
         DECLARE C2 CURSOR FOR SELECT LAST_NOTIFICATION FROM TEST.THRESHOLD_NOTIFY_CONTROL;
         DECLARE C3 CURSOR FOR SELECT THRESHOLD_PREDICATE, THRESHOLDID, TIME_OF_VIOLATION, APPL_ID 
                           FROM TEST.THRESHOLDVIOLATIONS_T
                           WHERE LAST_VIOLATION_SEEN IS NULL OR TIME_OF_VIOLATION > LAST_VIOLATION_SEEN;
      
         DECLARE CONTINUE HANDLER FOR NOT FOUND
            SET NOT_FOUND = 1;
    
         OPEN C1; 
         FETCH C1 INTO NEWEST_VIOLATION;
         CLOSE C1; 
    
         IF ( NOT_FOUND = 0 ) THEN
    
            OPEN C2; 
            FETCH C2 INTO LAST_VIOLATION_SEEN;
            CLOSE C2; 
    
            IF ( NOT_FOUND = 1 ) THEN
               SET LAST_VIOLATION_SEEN = NULL;
            END IF;
    
            IF ( NOT_FOUND = 1 OR NEWEST_VIOLATION > LAST_VIOLATION_SEEN ) THEN
    
               DELETE FROM TEST.THRESHOLD_NOTIFY_CONTROL; 
               INSERT INTO TEST.THRESHOLD_NOTIFY_CONTROL VALUES ( NEWEST_VIOLATION ); 
    
               SET SENDER = '<sender email address>'; 
               SET RECIPIENTS = '<receiver email address>'; 
               SET SUBJECT = 'New WLM Threshold Violations' ; 
    
               SET NOT_FOUND = 0; 
               SET MESSAGE = ''; 
    
               OPEN C3; 
    
               FETCH C3 INTO THRESHOLD_PREDICATE, THRESHOLDID, TIME_OF_VIOLATION, APPL_ID; 
    
               WHILE ( NOT_FOUND = 0 ) DO 
    
                  SET MESSAGE = MESSAGE || 'Violation Timestamp = ' || TIME_OF_VIOLATION || CHAR(X'0A'); 
                  SET MESSAGE = MESSAGE || 'Threshold Predicate = ' || THRESHOLD_PREDICATE || CHAR(X'0A'); 
                  SET MESSAGE = MESSAGE || 'Threshold Id        = ' || THRESHOLDID || CHAR(X'0A'); 
                  SET MESSAGE = MESSAGE || 'Appl Id             = ' || APPL_ID || CHAR(X'0A') || CHAR(X'0A'); 
    
                  FETCH C3 INTO THRESHOLD_PREDICATE, THRESHOLDID, TIME_OF_VIOLATION, APPL_ID; 
               END WHILE; 
    
               CLOSE C3; 
    
               CALL UTL_MAIL.SEND( SENDER, RECIPIENTS, NULL, NULL, SUBJECT, MESSAGE ); 
    
               COMMIT; 
    
            END IF; 
    
        END IF; 
    
    END@
  6. Enable the Db2 Administrative Task Scheduler by running the following command:
    db2set DB2_ATS_ENABLE=YES
  7. Schedule the threshold notification procedure to execute every 10 minutes. To schedule the procedure, you must have execute privileges on the procedure. The following is an example of how this can be done:
    CALL SYSPROC.ADMIN_TASK_ADD(
       'CHECK THRESHOLD VIOLATIONS EVERY 10 MINUTES',
       NULL,
       NULL,
       NULL, 
       '0-59/10 * * * *',
       'TEST',
       'NOTIFY_ON_THRESHOLD_VIOLATION',
       NULL,
       NULL,
       NULL )@

Results

An email is sent whenever a WLM threshold violation occurs (with a latency of at most 10 minutes). The email describes all the WLM threshold violations since the last run of the threshold violation procedure scheduled to run every 10 minutes.

Example

The following output is an example of the content of an email notification message showing the accumulated new threshold violations since the last run of the threshold violation procedure:
Subject: New WLM Threshold Violations

Violation Timestamp = 2010-01-11-10.57.21.000000
Threshold Predicate = CPUTime
Threshold Id        = 1
Appl Id             = *LOCAL.horton.100111154912

Violation Timestamp = 2010-01-11-10.57.28.000000
Threshold Predicate = CPUTime
Threshold Id        = 1
Appl Id             = *LOCAL.horton.100111154912

Violation Timestamp = 2010-01-11-10.57.35.000000
Threshold Predicate = CPUTime
Threshold Id        = 1
Appl Id             = *LOCAL.horton.100111154912