Enable session failover by modifying the workload placement policy file on the multicluster
primary host, or with
environment variables on the client host. All client side settings take
precedence (that is, they override values in the primary host policy file). If a
setting is not specified in the policy file, the system uses the value on the client instead. This
topic shows how to complete this task by modifying the workload placement policy file on the
multicluster
primary host.
Procedure
-
Enable multicluster
session-level failover, configure either the multicluster
primary host or the client
host:
- Modify the workload placement policy file settings on your multicluster
primary host:
- workloadRedirection: Enables task-level redirection. Specify
session (which is also the default value).
- workloadRedirectionFailover: Enables or disables failover for workload
redirection. When the workloadRedirection parameter is set to
session, enables or disables task recovery. Specify
enabled (the default is disabled).
- topNClusterForTaskRedirection: Defines the first N
clusters to distribute tasks from a session for task-level redirection. All tasks are submitted only
to these N clusters. Valid values are 1 to 20. The default is 3.
- topNClusterShareValue: Defines the share value for the first
N clusters. This share value uses a smooth weighted round-robin configuration to
distribute tasks that are submitted to the clusters. Valid values are 1 to 100, separated by a comma
(for example,
3,2,1
). Default is , indicating that all the
highest-ranking clusters have the same share value.
<?xml version="1.0" encoding="UTF-8"?>
<Policy name="SessionFailover" description="Session Failover Policy" owner=""
xmlns="http://www.ibm.com/Symphony/schema"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.ibm.com/Symphony/schema ../7.2.0/schema/SmcPlacementPolicy.xsd">
<Clusters>
<PrimaryGroup>
</PrimaryGroup>
<OverflowGroup enableOverflow="false">
</OverflowGroup>
</Clusters>
<Application
workloadRedirection="session"
workloadRedirectionFailover="enabled"/>
</Policy>
- Override workload placement file configurations defined in the policy file by configuring
the SMC_WORKLOAD_REDIRECTION and
SMC_WORKLOAD_REDIRECTION_FAILOVER environment variables on the client host. By
default, these variables are set as
SMC_WORKLOAD_REDIRECTION=session
and
SMC_WORKLOAD_REDIRECTION_FAILOVER=disabled
. When session-level redirection is
enabled (
SMC_WORKLOAD_REDIRECTION=session
), set
SMC_WORKLOAD_REDIRECTION_FAILOVER to
enabled to enable
session failover and task recovery. For example, for
bash:
export SMC_WORKLOAD_REDIRECTION_FAILOVER=enabled
To enable session failover but not resubmit the session's unfinished tasks to other sessions, set
SMC_WORKLOAD_REDIRECTION_FAILOVER to
session_failover_without_task_recovery. For example, for
bash:
export SMC_WORKLOAD_REDIRECTION_FAILOVER=session_failover_without_task_recovery
- Optional:
If you have sessions running on a cluster, and IBM® Spectrum Symphony detects zero resources
available (that is resource starvation for the cluster), then it closes the current
session (and cancels active tasks), creates a new session on another cluster, and fails over to that
cluster. To enable IBM Spectrum Symphony
to query resource starvation, by configuring either the multicluster
primary host or the client
host:
- Modify the workload placement policy file to include the
resubmitOnZeroResourcesTimeoutMinutes setting on your multicluster
primary host.
For example,
to specify that if a cluster has zero allocation for an application for 5 minutes, the policy should
fail active tasks and resubmit to an overflow cluster, specify:
<Application resubmitOnZeroResourcesTimeoutMinutes="5"
topNClusterForTaskRedirection="2"
topNClusterShareValues="3,2"
workloadRedirection="task"
workloadRedirectionFailover="enabled"/>
- Override workload placement file configurations defined the policy file by configuring the
SMC_RESUBMIT_ON_ZERO_RESOURCES_TIMEOUT_MINUTES environment variables on the
client host.
By default,
SMC_RESUBMIT_ON_ZERO_RESOURCES_TIMEOUT_MINUTES is disabled. All client host
multicluster variables
override the policy file on the multicluster
primary host; therefore, if
SMC_RESUBMIT_ON_ZERO_RESOURCES_TIMEOUT_MINUTES is not specified, the system
uses the resubmitOnZeroResourcesTimeoutMinutes value from the policy
file.
For example, for bash, to specify that if a cluster has zero allocation for an
application for 5 minutes, the policy should fail active tasks and resubmit to an overflow cluster
for bash, set
SMC_RESUBMIT_ON_ZERO_RESOURCES_TIMEOUT_MINUTES to
5:
export SMC_RESUBMIT_ON_ZERO_RESOURCES_TIMEOUT_MINUTES=5
Note that if both the
resubmitOnZeroResourcesTimeoutMinutes (or
SMC_RESUBMIT_ON_ZERO_RESOURCES_TIMEOUT_MINUTES) and the
workloadRedirectionFailover (or
SMC_WORKLOAD_REDIRECTION_FAILOVER) parameters are set, then the system closes
the current session and cancels active tasks, and also resubmits the tasks to another cluster. (If
workloadRedirectionFailover (or
SMC_WORKLOAD_REDIRECTION_FAILOVER) is not enabled, the policy only fails the
active tasks (and does not resubmit them to other clusters).
- Optional: To
allow migrated sessions to use their original creation times from their original clusters, and
maintain a relative spot in the execution queue, you can enable the
useInitialSessionCreationTime parameter in your multicluster workload placement
policy (or set the SMC_USE_INITIAL_SESSION_CREATION_TIME environment variable
on the client host):
- Modify the workload placement policy file to enable the
useInitialSessionCreationTime setting on your multicluster
primary host.
For example,
to enable this setting:
<Application rerankIntervalMinutes="0"
resubmitOnZeroResourcesTimeoutMinutes="1"
useInitialSessionCreationTime="enabled"
topNClusterForTaskRedirection="3"
topNClusterShareValues="1,1,1"
workloadRedirection="session"
workloadRedirectionFailover="enabled"/>
- Override workload placement file configurations defined the policy file by configuring the
SMC_USE_INITIAL_SESSION_CREATION_TIME environment variables on the client host.
For example, for bash, to enable this
setting:
export SMC_USE_INITIAL_SESSION_CREATION_TIME=enabled
Results
Multicluster
session failover is enabled. Additionally, if you also configured:
- resubmitOnZeroResourcesTimeoutMinutes or
SMC_RESUBMIT_ON_ZERO_RESOURCES_TIMEOUT_MINUTES
- If set, then the policy also checks for resource starvation, and if detected, fails active tasks
and resubmits them to an overflow cluster.
-
useInitialSessionCreationTime or
SMC_USE_INITIAL_SESSION_CREATION_TIME
- If set, migrated sessions maintain their original creation time from their original cluster,
maintaining their relative order in the execution queue.
What to do next
To check whether
multicluster session failover is
enabled, follow these steps:
- From the multicluster management console, select .
- Expand Application Settings to reveal the applications and policies
table.
- Under Bins, select the information icon next to a workload bin to check
whether workload redirection applies to sessions, and whether failover for
session-level workload redirection is enabled for that bin.