Priority aging of ongoing work

Priority aging is an approach to workload management in which the priority of in-progress activities automatically changes over time.

The longer an activity runs, the lower its priority, and the fewer resources it will receive. You can use priority aging to control longer-running activities, so that throughput for shorter-running activities can be improved. The priority aging approach works when resource control responds to the movement of work between service classes. That is, when work that is already being processed changes service classes, the movement is actually reflected in the resources that are received by that work (as well as other work in the new service class). This approach is best implemented when using the explicit CPU controls that are provided by the CPU dispatcher or by integration with operating system workload management products.

Changing the priority of activities by remapping

System resources are allocated and controlled by using service classes. With priority aging, the priority of an activity can be changed by moving the activity from one service class to another service class. The priority increases if the new service class has more resources, and the priority decreases if the new service class has fewer resources. Activities are moved when a threshold with a REMAP ACTIVITY action is violated, based upon predetermined maximum usage of a specific resource such as processor time or rows read. After an activity is mapped to a new service class, it continues to run with the new resource constraints applied.

A simple approach that you can use to help short queries to run faster is to define a series of service classes with successively lower levels of resource priority and threshold actions that move activities between the service subclasses. Using this setup, you can decrease, or age, the priority of longer-running work over time and perhaps improve response times for shorter-running work without having detailed knowledge of the activities running on your data server.

Figure 1. A simple tiered setup that shows three service classes with successively lower priority

You can create this setup by assigning a high priority for all applicable resources to one service class, medium priority to a second service class, and low priority to a third service class. As work enters the system, it is automatically placed into the first service class and begins running using the high-priority settings of this service class. If you also define thresholds for each of the service classes that limit the time or resources used during execution, work is dynamically reassigned to the next-lower service class if the threshold of the next-higher class is violated. This dynamic resource control is repeatedly applied until the work is completed or is in the lowest-priority class, where it remains until it is completed or you force it to stop running.

In-service-class and in-all-service-class thresholds

Remapping of activities is available with any of the in-service-class thresholds and in-all-service-class thresholds.

In-service-class thresholds control the amount of a resource that can be used while an activity is running in a particular service subclass. Examples of resources are:

The amount of processor time used per activity per member; this resource is controlled by the CPUTIMEINSC threshold.
The number of rows read by an application per activity per member; this resource is controlled by the SQLROWSREADINSC threshold.

The in-service-class thresholds provide controls similar to Db2® Governor rules, which act on processor time and rows read monitor elements. These thresholds differ from other activity thresholds, which control resources used throughout the entire lifetime of an activity.

In-all-service-class thresholds control the amount of a resource that is used, across all service classes where an activity runs. An example of a resource is time, controlled by the ACTIVITYTOTALRUNTIMEINALLSC threshold. These thresholds behave similarly to other activity thresholds, which control resources used throughout the entire lifetime of an activity.

Examples of resources are the amount of processor time used (CPUTIMEINSC threshold), and the number of rows read by an application (SQLROWSREADINSC threshold), per activity per member. These thresholds differ from other activity thresholds, which control resources used throughout the entire lifetime of an activity.

Because of the control that in-service-class thresholds and in-all-service-class thresholds provide over service subclasses, you can define in-service-class thresholds and in-all-service-class thresholds only on a service subclass domain.

When an in-service-class threshold or an in-all-service-class threshold is associated with a REMAP ACTIVITY action, agents working for the activity periodically check whether the threshold has been violated. If an agent detects a threshold violation, the agent triggers the REMAP ACTIVITY action for the activity and then remaps itself to the target service subclass. All other agents working for the activity on the same member remap themselves to the target service subclass when they detect that the activity has been remapped. Only one agent detects the threshold violation and remaps the activity, and the activity is considered remapped after that agent has detected the threshold violation and performed the remapping.

Two monitor elements provide information about activity remapping within service subclasses. The act_remapped_in monitor element provides a counter that records how many activities were remapped into a service subclass and is incremented each time for the target service subclass that an activity is remapped to. Similarly, the act_remapped_out monitor element counter is incremented each time for the source service subclass that an activity is remapped out of. An additional monitor element, num_remaps, counts the number of times in total that an activity has been remapped between service subclasses.

An activity can be remapped multiple times to different service subclasses, and an activity can return to its original service subclass after being remapped to another service subclass.

The CPUTIMEINSC and SQLROWSREADINSC in-service class thresholds are evaluated separately for an activity on each member, without coordination. Because there is no coordination between members, when an activity is remapped on one member, it is possible for the same activity to be in different service subclasses on different members simultaneously.

The ACTIVITYTOTALRUNTIMEINALLSC threshold is evaluated on the coordinator member only. When an activity is remapped, it changes service classes on all members where it is executing.

When subagent work for an activity is completed on a remote member and further work for the same activity is sent to the same member later, the activity restarts in the same service subclass as the agent that sent the request to the member. If you defined an in-service-class threshold for this service subclass, the timer or counter for the activity on the remote member restarts at zero.

Where activities are nested, parent and child activities are tracked separately. Therefore, if a child activity is using an excessive amount of resources, only this activity, not its parent or sibling activities, violates a threshold.

Comparison of in-service-class thresholds and in-all-service-class thresholds

There are two differences between in-service-class thresholds (such as CPUTIMEINSC and SQLROWSREADINSC) and in-all-service-class thresholds (such as TOTALRUNTIMEINALLSC).

The first difference pertains to scope.

In a partitioned database environment, when an activity is remapped in an in-service-class threshold, the remapping is separate on each member. In other words, in-service-class thresholds are evaluated separately for an activity on each member, without coordination across members. When an activity is remapped on one member, it is possible for the same activity to be in different service subclasses on other members.
In a partitioned database environment, when an activity is remapped in an in-all-service-class threshold, the remapping is global. In other words, when an in-all-service-class threshold causes an activity to remap to a different service class, the activity is remapped to the new service class on all members. The service class switch is not limited only to the member where the threshold violation occurs.

The second difference pertains to the measurement of the quantity (time or resource) enforced by the threshold.

When an in-service-class threshold is violated with a remap action, the measurement for the resource that is controlled by the threshold resets to 0. In other words, the resource is measured in the context of that service class. If the threshold is violated, the activity is moved to the next service class, with the measurement restarting at 0.
When an in-all-service-class threshold is violated with a remap action, the measurement for time or resource usage does not reset to 0. In other words, the resource is measured across all service classes.

Example using in-service-class thresholds

This example uses service classes A, B and C:

The CPUTIMEINSC threshold on service class A remaps the activity to service class B if CPU usage exceeds 10 seconds
The CPUTIMEINSC threshold on service class B remaps the activity to service class C if CPU usage exceeds 18 seconds

In this example, an activity starts in service class A, uses 10 seconds of CPU time, then is remapped to service class B. The resource usage counter is reset to 0. The activity will not move to service class C until it uses 18 seconds of CPU while in service class B.

Example using in-all-service-class thresholds

This example uses service classes X, Y and Z:

The ACTIVITYTOTALRUNTIMEINALLSC threshold on service class X remaps the activity to service class Y if CPU usage exceeds 10 minutes
The ACTIVITYTOTALRUNTIMEINALLSC threshold on service class Y remaps the activity to service class Z if CPU usage exceeds 18 minutes

In this example, an activity starts in service class X, uses 10 minutes of CPU time, then is remapped to service class Y. The resource usage counter is not reset. The activity moves to service class Z after it uses 8 minutes CPU while in service class Y.

Using the in-service-class thresholds

On data servers where the primary resource activities have to compete for is processor time, use the CPUTIMEINSC threshold as your first measure of control. On data servers where queries reading many table rows result primarily in I/O contention, use SQLROWSREADINSC. On systems that see a combination of heavy processor and IO activity, use a combination of the CPUTIMEINSC and SQLROWSREADINSC thresholds.

How much of a given resource you permit activities to consume in a service subclass before remapping them to a different service subclass depends largely on your particular environment. To find the best value for each threshold condition, you need to monitor how activities are being processed on your data server. If the maximum amount of processor time that can be used or the maximum number of rows that can be read in a service class is set too high, activities will inappropriately start and finish in the same service subclass regardless of how much resource each activity requires. If the maximum processor time or rows read is set too low, no activity will finish in the service class it is originally mapped to and every activity will end up being remapped to the another service class regardless of business priority. In either case, your tiered configuration will not benefit the overall throughput on your data server and activities are not treated according to their business priority effectively.

In addition to determining how much of a given resource an activity can consume, some thresholds allow you to define a check interval for how often the data server checks for threshold violations. This capability is provided for thresholds where it is too expensive to check the threshold each time a unit of the resource being controlled is consumed and determines the latency with which violations of these thresholds are detected. The CPUTIME and SQLROWSREAD thresholds and their in-service class counterparts CPUTIMEINSC and SQLROWSREADINSC support check intervals. On serial database instances, the check interval equals the amount of real time that you want to elapse between checks for a threshold violation. In multimember database environments or on SMP instances, the check interval should be set to a value that is less than the amount of real time elapsed to take into account that there can be more than one agent accumulating processor time simultaneously for the activity. To calculate the approximate check interval in multimember database environments or on SMP instances, divide the amount of real time you want to elapse between checks by the degree of parallelism for the activity and use the resulting value for the CHECKING EVERY clause.

For example: In a single member database, if you want a CPUTIMEINSC threshold to trigger a REMAP ACTIVITY action after 30 seconds of processor time have been consumed, you can set the check interval to 30 seconds and be certain that the threshold action will be triggered after no more than 30 seconds of processor time have been consumed (processor time used cannot outstrip real time elapsed). In a multimember database environment, if you define a CPUTIMEINSC threshold that is set at 5 seconds with a check interval of 5 seconds, and an activity has 1 coordinator member agent and 4 subagents working on its behalf, it is possible for the activity to consume 5 seconds of CPU time in just 1 second of real time, because 5 agents simultaneously accumulate 1 second of processor time each. To prevent the activity from consuming a multiple of 5 seconds of processor time, the check interval should in this case be set to 1 second.

For additional information on how to use the thresholds, see the sample tiering scripts and priority aging scenarios.

Effect of remapping on thresholds

Which thresholds continue to apply after remapping through a REMAP ACTIVITY action depends on whether the thresholds apply only to a specific service subclass or throughout the lifetime of an activity.

When you remap an activity to a new service subclass, only the in-service-class thresholds, such as CPUTIMEINSC and SQLROWSREADINSC, change. These in-service-class thresholds no longer affect an activity after it leaves the source service subclass, and they are replaced with the corresponding thresholds for the target subclass, if you defined those thresholds. All other activity thresholds from the service subclass to which the activity was originally mapped remain unchanged, and applicable threshold timers and counters are not reset. The activity is not re-evaluated against any other thresholds that you defined for the target service subclass.

For example, assume that two service subclasses with thresholds are defined as follows:

Service subclass A with the following thresholds:
- An ACTIVITYTOTALTIME lifetime threshold TH1 with a STOP EXECUTION action after 30 minutes have elapsed
- An SQLROWSREADINSC in-service-class threshold TH2 with a REMAP ACTIVITY action to service subclass B after more than 2000 rows have been read
Service subclass B with the following thresholds:
- An ACTIVITYTOTALTIME lifetime threshold TH3 with a STOP EXECUTION action after 5 minutes have elapsed
- An SQLROWSREADINSC threshold TH4 with a STOP EXECUTION action after more than 1000 rows have been read

When an activity enters the system in service subclass A, both thresholds TH1 and TH2 apply to the activity. If the activity reads more than 2000 rows during query evaluation, it is dynamically remapped to service subclass B. Because of the remapping of the activity to subclass B, the applicable in-service-class thresholds change, and TH4 rather than TH2 now applies to the activity. Counters for both thresholds are reset to zero, and even though the activity has read more than 2000 rows in the original service subclass, the counter for TH4 is restarted at zero; the activity must read more than 1000 rows while running in service subclass B before threshold TH4 is violated. Threshold TH1, which applies throughout the lifetime of the activity, continues to apply, even though the activity is now running in a different subclass. Threshold TH3 does not exercise any control over the remapped activity at all, because it did not apply to the first service subclass that the activity entered when it began running.