Resource reclaim

Resource reclaim, which is part of IBM® Spectrum Symphony's borrowing, lending, and sharing capabilities, ensures that consumers can take back their deserved shared or lent resources as needed to meet workload demand. It is applicable to all supported platforms for IBM Spectrum Symphony; it does not apply to IBM Spectrum Symphony Developer Edition, which does not have resource lending, borrowing, and reclaiming.

Purpose of reclaiming resources

Reclaiming resources provides a way for the system to reallocate borrowed or shared resources to a consumer when the consumer has workload demand under any of the following conditions:

A lending consumer has workload demand that requires slots that are owned by the lending consumer.
Share ratios are configured, and an under-allocated consumer (a consumer that is not currently using its deserved number of shared slots) has workload demand that requires the use of more slots.
A time-based resource plan has time intervals that change the number of owned resources, share ratios and limits, borrowing and lending policies, and borrowing and lending limits for one or more consumers.

The system does not always return the same resource that the consumer originally lent. If workload is running on a borrowed resource, the system might reclaim a different physical resource (that meets the resource requirements) from the borrower and allocate that resource to the lending consumer in place of the original resource.

The following sections illustrate how resource reclaim works when borrowing, lending, or sharing is enabled.

Important: Resource reclaim is enabled by default whenever borrowing and lending are enabled. You cannot disable resource reclaim for borrowed or lent resources.

Behavior with reclaim for borrowing and lending

You can choose to enable borrowing and lending for owned resources. When you enable borrowing and lending, resource reclaim is always enabled.

Flow of how resource reclaim works for borroing and lending

Behavior without resource reclaim for sharing

In this example, the share ratio is 3:1. Consumer A deserves 3 times the number of slots as Consumer B.

Flow of behvavior without resource reclaim for sharing

Behavior with resource reclaim for sharing

In this example, the share ratio is 3:1. Consumer A deserves 3 times the number of slots as Consumer B.

Flow of behvavior with resource reclaim for sharing

Service instance workflow and interruption handling

Resource reclaim is enabled whenever you enable lending or borrowing for leaf consumers that own resources. By default, the system performs the following actions:

Immediately sends an interrupt event to the service to notify it of the pending reclaim.
Allows the service the number of seconds specified in the reclaim grace period to complete processing before it terminates the service instance. Tasks that were running on the service instance before it is terminated are requeued to their respective sessions. The default grace period is 0 seconds.
After the reclaim grace period expires, the resource orchestrator allows 120-seconds leeway time for the return of any reclaimed resources. The leeway time accounts for network overhead and other considerations.

The onServiceInterrupt service handler method provides the most effective way to manage an interruption that is caused by resource reclaim. Use of this method ensures that the service instance receives immediate notification of a pending interruption.

During a reclaim, the service interrupt indicates how much time the service instance takes to complete current running service method and the service instance to clean up. If the service method and cleanup does not complete within the set time, then the instance is terminated. Before the timeout expires, cleanup is initiated after the current running service method completes.

If a task is running and the Invoke method completes during the applied reclaim grace period, the result of that method is treated as it would be treated under normal circumstances.

If a task is running and the Invoke method does not complete before the applied reclaim grace period expires, the service instance on which the task is running is terminated and the task is requeued.

Another but less effective way to manage an interruption is for the service instance to periodically call the getLastInterruptEvent method for interrupt events. With this method, the service instance polls and does not immediately detect the interrupt. While the service instance is polling, the reclaim grace period is expiring, and the service instance has less time to return a result or shut down gracefully.

Resource reclaim behavior (consumer level)

Consumers reclaim resources in the following order, regardless of a consumer's history of resource usage:

Type of resource the system reclaims	Order that reclaim occurs	Example
Borrowed resources	Resource requirements, which are determined by the resource group that is associated with the consumer.	If the lending consumer needs a Windows slot with a certain amount of available memory, the system looks first for an analogous resource to reclaim.
Shared resources	Relative consumer rank, which is configured in the Resource Plan. Consumer rank is an optional setting. A rank of 0 is the highest rank and larger numbers indicate a decrease in rank. The system reclaims resources from the lowest ranking consumer first. By default, the system enforces share ratios at the level of the leaf (child) consumers. If your system is configured to enforce share ratios at the parent level, the system reclaims resources from the parent consumer.	The system first reclaims resources from a consumer with rank 50, and then reclaims resources from a consumer with rank 25. Consumer A is a child consumer of Parent A. Parent A and Parent B are siblings. With share ratio that is enforced at the parent level, Parent A shares 10 slots with Parent B. Parent B is running workload on 5 slots that are obtained from Parent A's share. If Consumer A's demand is unsatisfied for 2 slots and all of Parent A's slots are allocated, the system reclaims 2 slots from Parent B to allocate to Parent A.

Resource reclaim behavior (resource level)

Session importance (preemption rank or session priority) and preemption criteria greatly influence resource reclaim, but the selective reclaim configuration is the most important parameter to determine whether or not the other parameters can influence host selection.

Note: Selective reclaim can be enabled only if Optimized for application specified conditions is configured through the cluster management console System & Services > Cluster > Properties page (default setting). If selective reclaim is disabled, the system still selects the best slot on a host, but it might appear that resource selection happens at random because there is no effort to select the best host among multiple candidates.

The system chooses the resource by using the following logic.

Consider selective reclaim configuration:
1. If selective reclaim is disabled, the system reclaims resources as quickly as possible, with minimum overhead (default).
  For example, if multiple hosts in the consumer meet the resource requirement, the system selects any one at random.
2. If selective reclaim is enabled, the system reclaims resources from the less important sessions first. This option has greater overhead.
  For example, if multiple hosts in the consumer meet the resource requirement, the system selects all candidate hosts.
For proportional or minimum services scheduling, consider preemption rank. For priority scheduling, consider session priority instead of preemption rank.
1. With proportional or minimum services scheduling:
  From the host or hosts selected, select the least important session, according to preemption rank.
  
  If multiple sessions have equal low rank, select all candidate sessions.
  
  If the resource requirement is for an exclusive host, treat all sessions on a host as if they had the same rank as the most important session that uses the host.
2. With priority scheduling:
  From the host or hosts selected, select the least important session, according to session priority.
  
  If multiple sessions have equal low priority, select the most recently started session.
  
  If the resource requirement is for an exclusive host, treat all sessions on a host as if they had the same priority as the most important session that uses the host.
Consider preemption criteria.
1. If the criteria is MostRecentTask, reclaim resources from the most recently submitted tasks first.
  For example, from one or more sessions, the system selects the most recently started task and reclaims the resource that it is using.
  
  If multiple tasks have the same run time, the system selects any one at random.
  
  If multiple tasks run on a slot, consider the cumulative run time of all tasks that use the slot.
2. If the criteria is PolicyDefault, the behavior changes depending on the scheduling policy (default):
  - With proportional or minimum services scheduling:
    The default is to reclaim resources from the most over-allocated sessions first. This option has minimum overhead.
    
    For example, from multiple sessions, the system selects the most over-allocated session, and reclaims a resource that it is using (task selection is random).
    
    If multiple sessions are equally over-allocated, the system selects any one at random.
    
    If no session is over-allocated, select the least under-allocated instead.
  - With priority scheduling, a task from the session with the lowest priority is selected by default, followed by tasks from the last started session. This option has minimum overhead.

Selective reclaim

Selective reclaim can be enabled for slot-based scheduling and multidimensional scheduling. An application might be a candidate for selective reclaim when it needs to borrow slots from other consumers and has critical or long running tasks that you do not want to be interrupted.

Important: Selective reclaim does not take effect if Reclaim optimization is configured as Optimized for standby service in the cluster management console.

Note the following considerations when you are using selective reclaim:

Consider session priority or preemptionRank attributes (defined in the SessionTypes section of your application profile). The attributes usage differs depending on the scheduling policy (defined in policy attribute within the Consumer section of the application profile).
Here is an example application profile with priority and preemption rankings configured:
```
<SessionTypes>
 ...
	<Type name="type1" …  priority="1" preemptionRank="10"/>
	<Type name="type2" …  priority="2" preemptionRank="20/>
```
- For a priority scheduling policy (with policy="R_PriorityScheduling" configured), use a higher priority value to protect longer running tasks. A value of 1 is the lowest priority and 10000 is the highest priority.
- For all other scheduling policies except policy="R_PriorityScheduling", use preemptionRank values to rank the importance of sessions:
  - For workload that you do not want preempted by a slot reclaim, set the preemptionRank value to the highest level (for example, preemptionRank="20").
  - For normal workload that can be preempted by a slot reclaim, without consequence, set the preemptionRankvalue to the lowest level (for example, preemptionRank="10").
If there are long running tasks, set the preemption criteria to MostRecentTask to prevent the loss of CPU time for long running tasks.
If all the tasks are short running, set the preemption criteria to default for better SSM performance.
Selective reclaim is disabled when exclusive allocation (consumer level) is enabled for multidimensional scheduling.
If multidimensional scheduling standby services are enabled, the standby service requirement is satisfied before selective reclaim.

For more information on enabling selective reclaim, see Enabling selective reclaim.

Consumer demands

Consumers with workload demand can have lent resources reclaimed for them. When a resource is reclaimed, the system interrupts the borrower's tasks that are running on the reclaimed resource. The reclaim grace period allows time for a task that is running on a borrowed slot to complete before the resource returns to its owner. To avoid being requeued, tasks must exit within the reclaim grace period.

Tip: By default, when there are more than one leaf consumers under the same ancestor (such as the same parent, same grandparent, same great-grandparent consumer) and reclaims happens between them, EGO uses the reclaimed leaf consumer's grace period. To change this so that EGO uses the closest common ancestor's grace period for the two leaf consumers, configure EGO_USE_ANCESTOR_GRACE_PERIOD_FOR_RECLAIM=Y, in the ego.conf file on the primary and primary candidate hosts, so that EGO uses the ancestor's grace period, instead of the leaf consumer's grace period. This way, when the reclaim grace period for the ancestor consumer is large enough, any reclaims between the leaf consumers under the ancestor consumer will use that large grace period, allowing the running tasks for the reclaimed leaf consumer to run to completion, without interruption. Refer to ego.conf reference for details on the EGO_USE_ANCESTOR_GRACE_PERIOD_FOR_RECLAIM setting.

By default, the system reclaims owned resources only after it attempts to satisfy demand by borrowing resources from other lending consumers or from the share pool. You can change this behavior so that the system reclaims owned resources before it allocates borrowed or shared resources.

Time interval transitions

With a time-based resource plan that specifies different values for ownership, lend and borrow limits, share ratios and limits, or total slots in the share pool, a transition from one time interval to the next can trigger resource reclaim. By default, the system enforces ownership and limits when the new time interval takes effect. The following examples illustrate how time interval changes trigger resource reclaim:

Scenario	Behavior	Example
A consumer's ownership increases for the new time interval. Lending and borrowing are not configured, and another consumer is using more than its deserved resources.	The system reclaims slots whether or not the consumer's demand is unsatisfied.	Consumer A owns 10 slots between 8:00 AM. and 5:00 PM. and 25 slots between 5:01 and 11:49 PM. At 5:01 PM, Consumer B is using more than its deserved slots. At 5:01 PM, the system reclaims 15 slots to allocate to Consumer A.
A consumer's ownership decreases for the new time interval, and lending and borrowing are not configured	The system reclaims the number of slots that are required to conform to the ownership values configured for the new time interval, whether or not other consumer's demand is unsatisfied.	Consumer A owns 10 slots between 8:00 AM. and 5:00 PM. and 5 slots between 5:01 and 11:49 PM. Consumer B owns 5 slots between 8:00 AM. and 5:00 PM. and 10 slots between 5:01 and 11:49 p.m. At 5:01 PM, the system reclaims 5 slots from Consumer A, even if Consumer A's demand is unsatisfied, and allocates 5 slots to Consumer B.
A consumer's ownership decreases for the new time interval. Borrowing and lending for the consumer are configured, and a lending consumer has slots available	The system reclaims the number of slots that are required to conform to the ownership values configured for the new time interval, and then the consumer borrows available resources; the resource status changes from owned to borrowed.	Consumer A owns 10 slots between 8:00 AM and 5:00 PM. and 5 slots between 5:01 and 11:49 p.m. Consumer B owns 5 slots between 8:00 AM and 5:00 PM. and 10 slots between 5:01 and 11:49 PM At 5:00 PM, Consumer A has workload that runs on 10 slots and Consumer B has workload that runs on 5 slots. At 5:01 PM, the system reclaims 5 slots from Consumer A, even if Consumer A's demand is unsatisfied, and allocates 5 slots to Consumer B. Consumer A is configured to borrow from Consumer B, and Consumer B is configured to lend to Consumer A. Consumer B has no demand for the 5 reclaimed slots. Consumer A borrows 5 slots from Consumer B.
A consumer's lend limit decreases for the new time interval	The system reclaims the number of slots that are required to conform to the new lend limit whether or not the consumer's demand is unsatisfied.	Consumer A has a lend limit of 10 slots between 8:00 AM and 5:00 PM and 5 slots between 5:01 and 11:49 PM Consumer B borrows 10 slots from Consumer A. At 5:01 PM, the system reclaims 5 slots from Consumer B and allocates them to Consumer A.
A consumer's borrow limit decreases for the new time interval	The system reclaims the number of slots that are required to conform to the new borrow limit, whether or not the lending consumer's demand is unsatisfied.	Consumer A has a borrow limit of 10 slots between 8:00 AM and 5:00 PM and 5 slots between 5:01 and 11:49 PM Consumer A borrows 10 slots from Consumer B. At 5:01 PM, the system reclaims 5 slots from Consumer A to return to Consumer B.
A consumer's share limit decreases	The system reclaims the number of slots that are required to conform to the new share limit, whether or not a competing consumer's demand is unsatisfied.	Consumer A has a share limit of 10 slots between 8:00 AM and 5:00 PM and 5 slots between 5:01 and 11:49 PM A share pool is configured for the consumer branch (the parent consumer and its children). At 5:01 PM, the system reclaims 5 slots from Consumer A to return to the share pool.
The total number of slots in the share pool decreases	The system reclaims the number of slots that are required to maintain share ratios whether or not a competing consumer's demand is unsatisfied.	Consumers A and B each have a share ratio of 1. The consumer branch owns 10 slots between 8:00 AM and 5:00 PM and 4 slots between 5:01 and 11:49 PM At 5:00 PM, Consumer A runs workload on 5 slots, and Consumer B runs workload on 5 slots. At 5:01 PM, consumers A and B each return 3 slots to the share pool. During the new time interval, Consumer A runs workload on 2 slots and Consumer B runs workload on 2 slots.

Resource reclaim interface

To monitor resource reclaim from the cluster management console, click Resources > Monitor Resource Allocation. A list of consumers is displayed, along with each consumer's current allocation of owned, shared, and borrowed slots, and the consumer's current demand.

After you configure the borrowing, lending, and sharing for your cluster, you cannot directly control or release reclaimed resources. When you modify the resource plan and click Apply, changes take effect immediately, which might trigger resource reclaim.

User	Interface	Behavior
Cluster administrator (EGO)	Use the command-line and run `egosh resource close -reclaim resource_name`.	Closes a resource, preventing further allocation. The system reclaims the host before it closes; running workload units are requeued after the configured grace period.
Application developer	Use the onServiceInterrupt API	Closes a resource, preventing further allocation. The system reclaims the host before it closes; running workload units are requeued after the configured grace period.

The following table shows how to display resource reclaim configuration settings and details.

User	Command	Behavior
Cluster administrator Consumer administrator	From the cluster management console, click Consumers > Consumers & Plans > `consumer_name` > Consumer Properties > Reclaim behavior	Displays the settings for Reclaim grace period and Rebalance when resource plan changes or time interval changes.
Cluster administrator Consumer administrator	From the cluster management console, click Consumers > Consumers & Plans > Resource Plan > Show Advanced Settings > Expand All	Displays the ownership, rank, lend, borrow, and share settings for all consumers.
Cluster administrator	From the cluster management console, click Clusters > Summary > Cluster Properties > Specify resource allocation behavior	Displays the settings for Reclaim shared resources and Reclaim lent resources before borrowing.
Cluster administrator Consumer administrator	From the cluster management console, click Symphony Workload > Monitor Workload > Application Properties From the command-line, run: soamview app `app_name` -l	Displays the setting for Selective Reclaim.
Cluster administrator Consumer administrator	From the cluster management console Dashboard: Symphony Workload > Monitor Workload > Application Properties From the command-line, run soamview app `app_name` -l	Displays the setting for Preemption Criteria
Cluster administrator Consumer administrator	From the cluster management console, click Symphony Workload > Monitor Workload > `application_name` > `Session ID` > Session Properties. From the command-line, run soamview session `application_name:session_ID` -l	Displays the setting for Preemption Rank