Resource reclaim
Resource reclaim, which is part of IBM® Spectrum Symphony's borrowing, lending, and sharing capabilities, ensures that consumers can take back their deserved shared or lent resources as needed to meet workload demand. It is applicable to all supported platforms for IBM Spectrum Symphony; it does not apply to IBM Spectrum Symphony Developer Edition, which does not have resource lending, borrowing, and reclaiming.
Purpose of reclaiming resources
- A lending consumer has workload demand that requires slots that are owned by the lending consumer.
- Share ratios are configured, and an under-allocated consumer (a consumer that is not currently using its deserved number of shared slots) has workload demand that requires the use of more slots.
- A time-based resource plan has time intervals that change the number of owned resources, share ratios and limits, borrowing and lending policies, and borrowing and lending limits for one or more consumers.
The system does not always return the same resource that the consumer originally lent. If workload is running on a borrowed resource, the system might reclaim a different physical resource (that meets the resource requirements) from the borrower and allocate that resource to the lending consumer in place of the original resource.
Behavior with reclaim for borrowing and lending

Behavior without resource reclaim for sharing

Behavior with resource reclaim for sharing

Service instance workflow and interruption handling
- Immediately sends an interrupt event to the service to notify it of the pending reclaim.
- Allows the service the number of seconds specified in the reclaim grace period to complete processing before it terminates the service instance. Tasks that were running on the service instance before it is terminated are requeued to their respective sessions. The default grace period is 0 seconds.
- After the reclaim grace period expires, the resource orchestrator allows 120-seconds leeway time for the return of any reclaimed resources. The leeway time accounts for network overhead and other considerations.
The onServiceInterrupt service handler method provides the most effective way to manage an interruption that is caused by resource reclaim. Use of this method ensures that the service instance receives immediate notification of a pending interruption.
During a reclaim, the service interrupt indicates how much time the service instance takes to complete current running service method and the service instance to clean up. If the service method and cleanup does not complete within the set time, then the instance is terminated. Before the timeout expires, cleanup is initiated after the current running service method completes.
If a task is running and the Invoke method completes during the applied reclaim grace period, the result of that method is treated as it would be treated under normal circumstances.
If a task is running and the Invoke method does not complete before the applied reclaim grace period expires, the service instance on which the task is running is terminated and the task is requeued.
Another but less effective way to manage an interruption is for the service instance to periodically call the getLastInterruptEvent method for interrupt events. With this method, the service instance polls and does not immediately detect the interrupt. While the service instance is polling, the reclaim grace period is expiring, and the service instance has less time to return a result or shut down gracefully.
Resource reclaim behavior (consumer level)
| Type of resource the system reclaims | Order that reclaim occurs | Example |
|---|---|---|
| Borrowed resources | Resource requirements, which are determined by the resource group that is associated with the consumer. | If the lending consumer needs a Windows slot with a certain amount of available memory, the system looks first for an analogous resource to reclaim. |
| Shared resources | Relative consumer rank, which is configured in the Resource Plan. Consumer rank
is an optional setting. A rank of 0 is the highest rank and larger numbers indicate a decrease in
rank. The system reclaims resources from the lowest ranking consumer first. By default, the system enforces share ratios at the level of the leaf (child) consumers. If your system is configured to enforce share ratios at the parent level, the system reclaims resources from the parent consumer. |
The system first reclaims resources from a consumer with rank 50, and then
reclaims resources from a consumer with rank 25. Consumer A is a child consumer of Parent A. Parent A and Parent B are siblings. With share ratio that is enforced at the parent level, Parent A shares 10 slots with Parent B. Parent B is running workload on 5 slots that are obtained from Parent A's share. If Consumer A's demand is unsatisfied for 2 slots and all of Parent A's slots are allocated, the system reclaims 2 slots from Parent B to allocate to Parent A. |
Resource reclaim behavior (resource level)
The system chooses the resource by using the following logic.
- Consider selective reclaim configuration:
- If selective reclaim is disabled, the system reclaims resources as quickly as possible, with
minimum overhead (default).
For example, if multiple hosts in the consumer meet the resource requirement, the system selects any one at random.
- If selective reclaim is enabled, the system reclaims resources from the less important sessions
first. This option has greater overhead.
For example, if multiple hosts in the consumer meet the resource requirement, the system selects all candidate hosts.
- If selective reclaim is disabled, the system reclaims resources as quickly as possible, with
minimum overhead (default).
- For proportional or minimum services scheduling, consider preemption rank. For priority
scheduling, consider session priority instead of preemption rank.
- With proportional or minimum services scheduling:
From the host or hosts selected, select the least important session, according to preemption rank.
If multiple sessions have equal low rank, select all candidate sessions.
If the resource requirement is for an exclusive host, treat all sessions on a host as if they had the same rank as the most important session that uses the host.
- With priority scheduling:
From the host or hosts selected, select the least important session, according to session priority.
If multiple sessions have equal low priority, select the most recently started session.
If the resource requirement is for an exclusive host, treat all sessions on a host as if they had the same priority as the most important session that uses the host.
- With proportional or minimum services scheduling:
- Consider preemption criteria.
- If the criteria is MostRecentTask, reclaim resources from the most recently submitted tasks
first.
For example, from one or more sessions, the system selects the most recently started task and reclaims the resource that it is using.
If multiple tasks have the same run time, the system selects any one at random.
If multiple tasks run on a slot, consider the cumulative run time of all tasks that use the slot.
- If the criteria is PolicyDefault, the behavior changes depending on the scheduling policy (default):
- With proportional or minimum services scheduling:
The default is to reclaim resources from the most over-allocated sessions first. This option has minimum overhead.
For example, from multiple sessions, the system selects the most over-allocated session, and reclaims a resource that it is using (task selection is random).
If multiple sessions are equally over-allocated, the system selects any one at random.
If no session is over-allocated, select the least under-allocated instead.
- With priority scheduling, a task from the session with the lowest priority is selected by default, followed by tasks from the last started session. This option has minimum overhead.
- With proportional or minimum services scheduling:
- If the criteria is MostRecentTask, reclaim resources from the most recently submitted tasks
first.
Selective reclaim
- Consider session priority or preemptionRank attributes (defined in the
SessionTypessection of your application profile). The attributes usage differs depending on the scheduling policy (defined in policy attribute within theConsumersection of the application profile).Here is an example application profile with priority and preemption rankings configured:<SessionTypes> ... <Type name="type1" … priority="1" preemptionRank="10"/> <Type name="type2" … priority="2" preemptionRank="20/>- For a priority scheduling policy (with
policy="R_PriorityScheduling"configured), use a higher priority value to protect longer running tasks. A value of 1 is the lowest priority and 10000 is the highest priority. - For all other scheduling policies except
policy="R_PriorityScheduling", use preemptionRank values to rank the importance of sessions:- For workload that you do not want preempted by a slot reclaim, set the
preemptionRankvalue to the highest level (for example, preemptionRank="20"). - For normal workload that can be preempted by a slot reclaim, without consequence, set the
preemptionRankvalue to the lowest level (for example, preemptionRank="10").
- For workload that you do not want preempted by a slot reclaim, set the
- For a priority scheduling policy (with
- If there are long running tasks, set the preemption criteria to MostRecentTask to prevent the loss of CPU time for long running tasks.
- If all the tasks are short running, set the preemption criteria to default for better SSM performance.
- Selective reclaim is disabled when exclusive allocation (consumer level) is enabled for multidimensional scheduling.
- If multidimensional scheduling standby services are enabled, the standby service requirement is satisfied before selective reclaim.
For more information on enabling selective reclaim, see Enabling selective reclaim.
Consumer demands
By default, the system reclaims owned resources only after it attempts to satisfy demand by borrowing resources from other lending consumers or from the share pool. You can change this behavior so that the system reclaims owned resources before it allocates borrowed or shared resources.
Time interval transitions
| Scenario | Behavior | Example |
|---|---|---|
| A consumer's ownership increases for the new time interval. Lending and borrowing are not configured, and another consumer is using more than its deserved resources. | The system reclaims slots whether or not the consumer's demand is unsatisfied. |
|
| A consumer's ownership decreases for the new time interval, and lending and borrowing are not configured | The system reclaims the number of slots that are required to conform to the ownership values configured for the new time interval, whether or not other consumer's demand is unsatisfied. |
|
| A consumer's ownership decreases for the new time interval. Borrowing and lending for the consumer are configured, and a lending consumer has slots available | The system reclaims the number of slots that are required to conform to the ownership values configured for the new time interval, and then the consumer borrows available resources; the resource status changes from owned to borrowed. |
|
| A consumer's lend limit decreases for the new time interval | The system reclaims the number of slots that are required to conform to the new lend limit whether or not the consumer's demand is unsatisfied. |
|
| A consumer's borrow limit decreases for the new time interval | The system reclaims the number of slots that are required to conform to the new borrow limit, whether or not the lending consumer's demand is unsatisfied. |
|
| A consumer's share limit decreases | The system reclaims the number of slots that are required to conform to the new share limit, whether or not a competing consumer's demand is unsatisfied. |
|
| The total number of slots in the share pool decreases | The system reclaims the number of slots that are required to maintain share ratios whether or not a competing consumer's demand is unsatisfied. |
|
Resource reclaim interface
To monitor resource reclaim from the cluster management console, click . A list of consumers is displayed, along with each consumer's current allocation of owned, shared, and borrowed slots, and the consumer's current demand.
After you configure the borrowing, lending, and sharing for your cluster, you cannot directly control or release reclaimed resources. When you modify the resource plan and click Apply, changes take effect immediately, which might trigger resource reclaim.
| User | Interface | Behavior |
|---|---|---|
| Cluster administrator (EGO) | Use the command-line and run egosh resource close -reclaim resource_name. | Closes a resource, preventing further allocation. The system reclaims the host before it closes; running workload units are requeued after the configured grace period. |
| Application developer |
Use the onServiceInterrupt API |
Closes a resource, preventing further allocation. The system reclaims the host before it closes; running workload units are requeued after the configured grace period. |
| User | Command | Behavior |
|---|---|---|
|
From the cluster management console, click | Displays the settings for Reclaim grace period and Rebalance when resource plan changes or time interval changes. |
|
From the cluster management console, click | Displays the ownership, rank, lend, borrow, and share settings for all consumers. |
| Cluster administrator | From the cluster management console, click | Displays the settings for Reclaim shared resources and Reclaim lent resources before borrowing. |
|
From the cluster management console, click From the command-line, run: soamview app app_name -l |
|
|
From the cluster management console Dashboard:
From the command-line, run soamview app app_name -l |
|
|
From the cluster management console, click . From the command-line, run soamview session application_name:session_ID -l |
Displays the setting for Preemption Rank |