Control of work with thresholds

You can use thresholds to maintain stability in the system by identifying work that behaves abnormally. You can identify abnormal behavior predictively, before the work begins running, based on the projected impact. You can also identify abnormal behavior reactively, as the work is running and consuming resources.

An example of work that can be controlled with thresholds is a query that consumes large amounts of processor time at the expense of all other work running on the system. Such a query can be controlled either before it even begins executing, based on estimated cost, or after it has begun executing and is consuming more than the permitted amount of resources.

Types of thresholds

Connection thresholds
If you want to limit how long a connection can sit idle, use a connection threshold. These thresholds can be used to detect connections that sit idle for too long.
Table 1. Connection thresholds
Threshold Description
CONNECTIONIDLETIME Controls the amount of time that a connection sits idle and is not working on behalf of user requests. Use this threshold to detect inefficient use of data server resources and application wait conditions.
Unit of work thresholds
If you want to limit how long a unit of work executes, use a unit of work threshold. These thresholds limit the maximum amount of time that a unit of work may spend in the Db2® engine, and can be used to detect units of work that take too long to complete.
Table 2. Unit of work thresholds
Threshold Description
UOWTOTALTIME Controls the amount of time that a unit of work takes to execute.
Activity thresholds
If you want to limit the impact that specific activities can have on how the data server is running, activity thresholds provide you with one of the means you can use. Excess execution time, abnormally high volumes of data returned, or abnormally high amounts of resources consumed are all examples of warning flags that potentially troublesome activities could be consuming excessive resources, which you can control with activity thresholds.
Table 3. Activity thresholds
Threshold Description
ACTIVITYTOTALTIME Controls the amount of time that any given activity can spend from submission to completion, for both execution and queue time. Use this threshold to detect jobs that are taking an abnormally long time to complete.
ACTIVITYTOTALRUNTIME Controls the amount of time that an activity can spend while executing. This time does not include queue time. Use this threshold to detect jobs that are taking an abnormally long time to execute.
ACTIVITYTOTALRUNTIMEINALLSC Controls the maximum amount of run time that an activity can spend running in a particular service subclass. Use this threshold to move an activity to the next service class, based on the activity’s execution time.
CPUTIME Controls the maximum amount of combined user and system processor time that an activity may consume on a particular member during the execution of the activity. Use this threshold to detect and control activities which are consuming excessive processor resources.
CPUTIMEINSC Controls the maximum amount of combined user and system processor time that an activity may consume on a particular member while executing in a specific service subclass. Use this threshold to detect and control activities which are consuming excessive processor resources within the current service class.
DATATAGINSC Controls the data that can be touched or not touched by an activity while running a particular service subclass.
ESTIMATEDSQLCOST Controls DML activities that the query optimizer determines to have a large estimated cost. Use this threshold to predict potentially resource-heavy SQL before it starts executing on the system and identifying poorly written SQL.
SORTSHRHEAPUTIL
Attention: This feature is available in Db2 Version 11.5 Mod Pack 2 and later versions.

Controls the amount of shared sort memory resources that may be used by an activity. When WLM admission control is enabled this includes memory that the activity is estimated to use since high memory estimates can result in significant queue time. When WLM admission control is disabled, this includes only memory that is actually used by an activity. Use this threshold to detect and control activities with large sort memory requirements and to prevent activities with large sort memory requirements from causing excessive queuing when WLM admission control is enabled.

SQLROWSREAD Controls the maximum number of rows which can be read on any member by an activity. Use this threshold to detect and control activities which are reading an excessive number of rows.
SQLROWSREADINSC Controls the maximum number of rows which can be read by an activity on a particular member while executing in a specific service subclass. Use this threshold to detect and control activities which are reading an excessive number of rows within the current service class.
SQLROWSRETURNED Controls the number of rows returned when executing SQL. Use this threshold to identify when the amount of data exceeds a reasonable volume.
SQLTEMPSPACE Controls the amount of temporary table space a given activity can consume on a member. Use this threshold to prevent certain SQL statements from using up a disproportionate amount of temporary space, impeding the progress of other work.

The data server considers requests from utilities such as REORGCHK, IMPORT, and EXPORT to be user logic, and thus these requests are subject to any defined thresholds.

Aggregate thresholds
If you want to control the collective impact that certain activities, workloads, or connections can have on your data server, you can define aggregate thresholds. Aggregate thresholds often, but not always, enforce concurrency control in cases where you need to limit the number of certain activities running at the same time. Some aggregate thresholds have a built-in queue and are known as queuing thresholds.
Table 4. Aggregate thresholds
Threshold Description
AGGSQLTEMPSPACE Controls the maximum amount of system temporary table space that can be consumed in total across all activities in the service subclass. Use this threshold to detect and control activities that belong to a service subclass whose activities are consuming too much system temporary table space in aggregate across all of the activities in the service subclass.
CONCURRENTWORKLOADOCCURRENCES Controls the number of active occurrences of a workload that can run on a coordinator member at the same time. Use to control the spread of connections from a specific source.
CONCURRENTWORKLOADACTIVITIES Controls the number of individual activities that can run within a workload occurrence. Use to limit work within an individual workload occurrence.
CONCURRENTDBCOORDACTIVITIES Controls the number of concurrent activities in the domain that the threshold is associated with (database, work action, service superclass, or service subclass).
TOTALMEMBERCONNECTIONS Controls the number of database connections to a given member that can be established at the same time. Use to prevent a given member from becoming overloaded.
TOTALSCMEMBERCONNECTIONS Controls the number of database connections to a given member for work executing within a given service class at the same time. Similar to the total member connections but more granular because the connection is linked to a service class.

For those aggregate thresholds that support it, concurrency control is provided through a system of execution 'tickets.' Each incoming activity must claim a ticket from the applicable concurrency threshold before it can begin executing. Once all tickets are consumed, additional activities are queued until a ticket becomes available or an error is returned, depending on how you defined the threshold. If the concurrency threshold has queuing enabled, then a ticket passes from an activity that has finished execution to another activity that is in the queue. This activity can then leave the queue and begin execution. How many tickets are available per concurrency threshold depends on how you defined the threshold. For example, if you defined a CONCURRENTDBCOORDACTIVITIES threshold to limit the number of concurrently running database activities to 10, then there are 10 execution tickets available.

For stored procedures, activity thresholds are applied to the stored procedure and to its child activities. Concurrency thresholds are only applied to the child activities of the stored procedure and not to the CALL statement itself.

Taking action when thresholds are violated

The action that is taken dynamically when a threshold is violated depends on how you define the threshold.
Stop execution (STOP EXECUTION)
A common action when a threshold is violated is to stop the activity from executing. In this case, an error code is returned to the submitting application indicating that the threshold was violated. Note that for TOTALMEMBERCONNECTIONS and TOTALSCMEMBERCONNECTIONS thresholds, a STOP EXECUTION action prevents a connection from being established. For CONNECTIONIDLETIME thresholds, the connection is closed. For CONCURRENTWORKLOADOCCURRENCES, a new workload occurrence is prevented from being created. For all activity-related thresholds, the activity is stopped from continuing to execute. If a THRESHOLDVIOLATIONS event monitor is active, a record is written to the event monitor indicating that the threshold was violated.
Continue execution (CONTINUE)
In some situations, stopping the execution of an activity is too harsh a response. A preferable response is to permit the activity to continue to run and to collect the relevant data for an administrator to perform future analysis to determine how to prevent this condition from happening again. In this situation, no error code is returned to the submitting application. If the action is to continue, the user receives no indication that the threshold was violated. If a THRESHOLDVIOLATIONS event monitor is active, a record is written to the event monitor. If a CONTINUE threshold action is specified for a queuing threshold, this effectively renders the size of the queue unbounded, regardless of any hard value you include.
Force the application (FORCE APPLICATION)
When the UOWTOTALTIME threshold is violated, you can specify that the local or remote users or applications are forced off the system. This benefits other applications competing for server resources.
Remap the activity (REMAP ACTIVITY TO)
When an activity violates a certain limit, you may simply wish to assign different resource controls to it but to let the activity continue executing otherwise. Such a response permits you to dynamically raise or lower the amount of resources an activity can consume throughout its lifetime. In this case, an already executing activity is permitted to continue with no indication to the user or application that the threshold was violated, although the activity now runs with different resources available to it. Remapping is available with any of the in-service-class thresholds like CPUTIMEINSC, SQLROWSREADINSC, and DATATAGINSC.
Collect data (COLLECT ACTIVITY DATA)
When some thresholds are violated, data is collected. By default the fact that an activity threshold was violated is recorded in an activated threshold violations event monitor. If you want more detailed information about the activity that violated the threshold, you can request that information for the activity be written to the active event monitor for activities when the activity completes execution using the COLLECT ACTIVITY DATA clause