You can deploy an IBM Spectrum
Conductor cluster and workloads as LSF® jobs within an IBM Spectrum LSF (LSF) cluster.
About this task
To deploy
IBM Spectrum
Conductor within
LSF, you must create an
LSF batch queue, application
profile, and host group for the
IBM Spectrum
Conductor management cluster.
Note: These LSF application
profile, queue, and host group are for use only by the IBM Spectrum
Conductor management cluster to acquired
additional management hosts, and cannot be used for workloads. This host group must include at least
one host. The hosts in this host group must be different than the primary host of IBM Spectrum
Conductor and any primary candidate hosts (if IBM Spectrum
Conductor is configured with primary high availability and failover).
For workloads there must be at least one additional host available beyond the hosts specified
previously.
Procedure
-
Install IBM Spectrum LSF or use an existing installation. IBM Spectrum LSF
10.1.0.6 or
later is
supported.
- Create an LSF
cluster, or use an existing cluster.
- Install IBM Spectrum
Conductor as
a shared installation. See Installing IBM Spectrum Conductor to a shared environment.
When you install to a shared environment, you install IBM Spectrum
Conductor once on a shared file system
that every host in the cluster shares and to which every host has access. These hosts can be any
subset of the hosts that are included in the LSF cluster. These hosts must be of
the types that are supported by IBM Spectrum
Conductor.
If
your LSF cluster has root
job submission disabled, then apply the root squash setting (including the ROOT_SQUASH_INSTALL
environment variable) when installing IBM Spectrum
Conductor (see step 4 in Installing IBM Spectrum Conductor to a shared environment). In this case the IBM Spectrum
Conductor cluster administrator user
must be a non root user.
- Configure the IBM Spectrum
Conductor cluster.
- Log in to the host that is selected as the IBM Spectrum
Conductor
primary host and, as the IBM Spectrum
Conductor cluster administrator, run the
following commands:
- Source the environment.
where install_directory is the installation directory.
- Add the primary
host.
egoconfig join primary_host_name
where primary_host_name is the primary host name.
- Set
entitlement.
egoconfig setentitlement entitlement_file_path
where
entitlement_file_path is the full path of the entitlement file.
- To support IBM Spectrum
Conductor
primary host high availability and
failover apply the following steps:
- Designate one host to be the IBM Spectrum
Conductor
primary host, at least one
additional host to be the IBM Spectrum
Conductor
primary candidates, and at least one
additional host to run instance group
management services. In this configuration, instance group management services do not run on
the primary host and primary candidate hosts.
- Log in to the host that is designated as the IBM Spectrum
Conductor cluster primary host and to each of the additional
hosts designated as the IBM Spectrum
Conductor
primary candidates, as the IBM Spectrum
Conductor cluster administrator, and run
the following commands on these hosts:
- Source the environment.
where installation_directory is the installation directory.
- Configure the shared directory for the primary host and the primary candidate
hosts.
egoconfig mghost shared_directory
where
shared_directory is a directory in the shared file system that is accessible to
the primary host and the primary candidate hosts and will be used
for high availability data. Important: Do not run the egoconfig mghost
shared_directory command on the hosts designated to run instance group management services.
For additional information on setting up primary host high availability and
failover, see Setting up primary host failover.
- Enable secure shell between all the following hosts: primary host, primary candidate hosts, and the hosts for
running instance group management
services. For details on how to enable secure shell between these hosts, see Enabling secure shell.
- Log in as the LSF administrator on any host in the LSF cluster.
- Create an LSF batch queue that is used for
acquiring hosts, in an exclusive mode, for the IBM Spectrum
Conductor management
cluster.
- Open the lsb.queues configuration file for your LSF
cluster. By default, the file is in the
$LSF_ENVDIR/lsbatch/cluster_name/configdir directory. For
more information on the lsb.queues configuration file, see lsb.queues.
- Add a queue for the IBM Spectrum
Conductor management cluster. The queue
definition must include the parameter EXCLUSIVE = Y.
For
example:
Begin Queue
QUEUE_NAME = conductor_management_queue
EXCLUSIVE = Y
End Queue
- Save your changes.
- Create an LSF
application profile for the IBM Spectrum
Conductor management cluster. The
application profile references the cluster termination script residing in
$EGO_TOP/conductorspark/2.5.0/lsf/ConductorSparkCtrlTerm.sh.
Note: The cluster termination script referenced by the LSF application profile for the
IBM Spectrum
Conductor management cluster
must reside in the directory $EGO_TOP/conductorspark/2.5.0/lsf and cannot be
copied to another directory.
- Open the lsb.applications configuration file for your LSF cluster. By default, the file is
in the $LSF_ENVDIR/lsbatch/cluster_name/configdir. For more
information on the lsb.applications configuration file, see lsb.applications.
- Add a new application profile definition for the IBM Spectrum
Conductor management cluster. The
application profile definition must include the parameter TERMINATE_CONTROL
that references ConductorSparkCtrlTerm.sh and must include the parameter
RESIZABLE_JOBS=Y. For example:
Begin Application
Name=conductor_management_app
RESIZABLE_JOBS=Y
TERMINATE_CONTROL=$EGO_TOP/conductorspark/2.5.0/lsf/ConductorSparkCtrlTerm.sh
End Application
- Save the changes.
- Create an LSF host group for the IBM Spectrum
Conductor management
cluster:
- Open the lsb.hosts configuration file for your
LSF cluster. By default,
the file is in the
$LSF_ENVDIR/lsbatch/cluster_name/configdir. For more
information on the lsb.hosts configuration file, see lsb.hosts.
- Add a host group that includes all the hosts in the LSF cluster that can be potentially
included in the IBM Spectrum
Conductor
management cluster to run instance group management services.
Note: In this host group, do not include the LSF hosts designated as the IBM Spectrum
Conductor primary host, the IBM Spectrum
Conductor primary candidate hosts (in
the case of high availability), and any host that may be used for running workloads. This host group
must include at least one host.
Enter a group name under the GROUP_NAME column, and enter a
list of hosts, space separated and within brackets, that will be used to acquire hosts for the
IBM Spectrum
Conductor management cluster.
For example:
Begin HostGroup
GROUP_NAME GROUP_MEMBER #GROUP_ADMIN # Key words
conductor_management_hostgroup (hostname1 hostname2)
End HostGroup
where
conductor_management_hostgroup is the host group
used for the
IBM Spectrum
Conductor
cluster.
- Save your changes.
- Edit the lsf.conf configuration file for your LSF cluster. By default, the file is
in the $LSF_ENVDIR directory. For more information on the
lsb.conf configuration file, see lsb.conf.
Enable allocation and usage of GPU resources by IBM Spectrum
Conductor workload, by
ensuring that the following settings are applied in the lsf.conf configuration
file of your LSF cluster, or otherwise apply the following settings:
- Open the lsf.conf configuration file for your
LSF
cluster.
- Enter the following parameters and values or ensure that these parameters and values
already exist in the file: LSB_GPU_NEW_SYNTAX=extend and
LSF_GPU_AUTOCONFIG=Y. These parameters are set for enabling GPU functionality
in your LSF cluster. To learn more about enabling GPU functionality, see Enabling GPU features in LSF.
- Verify and apply the LSF configuration.
- Check the LSF
configuration using the lsadmin command, as
follows:
lsadmin ckconfig
If any errors are reported, fix the
problem and check the configuration again.
- Apply the new LSF configuration, as follows:
- Check the LSF
configuration using the badmin command, as
follows:
badmin ckconfig
If any errors are reported, fix the
problem and check the configuration again.
- Apply the new LSF configuration, as follows:
- Ensure that the LSF cluster is running and ensure
that the following conditions are satisfied:
- The clocks of all the LSF hosts to be used for the IBM Spectrum
Conductor
management cluster and used as
compute hosts for IBM Spectrum
Conductor workload must be synchronized to
an identical time.
- The IBM Spectrum
Conductor installation top directory, the
directories used for instance group
deployment, and any further directories on which the IBM Spectrum
Conductor
workload relies on, all reside in shared storage that is accessible to all the LSF hosts that will
be used as IBM Spectrum
Conductor management cluster hosts and
compute hosts.
- If IBM Spectrum
Conductor
is not configured with primary host
high availability and failover, ensure that the LSF host designated as the IBM Spectrum
Conductor
primary host is available to run an
exclusive LSF job that will
start the IBM Spectrum
Conductor management cluster. This host cannot run any other LSF jobs.
- If IBM Spectrum
Conductor
is configured with primary host high
availability and failover, ensure that all the LSF hosts designated as the IBM Spectrum
Conductor
primary host, the IBM Spectrum
Conductor
primary candidates, and the IBM Spectrum
Conductor management host
for instance groups are all available
to run exclusive LSF jobs
that will start the IBM Spectrum
Conductor management cluster. These hosts cannot run any other LSF jobs.
- Start the IBM Spectrum
Conductor management cluster using the
LSF
bsub command,
or using an independent script.
Note: The scripts used for starting the IBM Spectrum
Conductor management cluster, either
referenced by the LSF
bsub command or invoked
independently, must reside in the directory $EGO_TOP/conductorspark/2.5.0/lsf and cannot be
copied to another directory.
- To start the IBM Spectrum
Conductor
management cluster using the LSF
bsub command:
- If IBM Spectrum
Conductor is not
configured with primary host high
availability and
failover:
bsub -x -n 1 -R "span[ptile=1]" -m “conductor_primary_host" -q conductor_management_queue -app conductor_management_app -J name_of_conductor_management_cluster_main_job -o "path_and_file_for_job_output" $EGO_TOP/conductorspark/2.5.0/lsf/ConductorSparkCtrlJob.sh $EGO_TOP/ admin_user admin_password conductor_management_hostgroup N keep_log_indication
- If IBM Spectrum
Conductor is configured
with primary host high availability
and
failover:
bsub -x -n number_of_primary_and_candidate_hosts_plus_one -R "span[ptile=1]" -m “conductor_primary_host! conductor_primary_candidate_1 … conductor_primary_candidate_k conductor_first_management_host" -q conductor_management_queue -app conductor_management_app -J name_of_conductor_management_cluster_main_job -o "path_and_file_for_job_output" $EGO_TOP/conductorspark/2.5.0/lsf/ConductorSparkCtrlJob.sh $EGO_TOP/ admin_user admin_password conductor_management_hostgroup Y keep_log_indication
- To start the IBM Spectrum
Conductor management cluster using an
independent start script, log into the host that is selected as the IBM Spectrum
Conductor cluster primary host and, as the IBM Spectrum
Conductor cluster administrator, run the
following command:
- If IBM Spectrum
Conductor is not
configured with primary host high
availability and
failover:
$EGO_TOP/conductorspark/2.5.0/lsf/ConductorSparkLsfStart.sh $EGO_TOP admin_user admin_password $LSF_ENVDIR conductor_management_hostgroup conductor_management_queue path_for_logging N acquire_primary_hosts_indication conductor_primary_host
- If IBM Spectrum
Conductor is configured
with primary high availability and
failover:
$EGO_TOP/conductorspark/2.5.0/lsf/ConductorSparkLsfStart.sh $EGO_TOP admin_user admin_password $LSF_ENVDIR conductor_management_hostgroup conductor_management_queue path_for_logging Y acquire_primary_hosts_indication conductor_primary_host conductor_primary_candidate_1 … conductor_primary_candidate_k conductor_first_management_host
where:
- number_of_primary_and_candidate_hosts_plus_one is the total number of hosts comprised of the IBM Spectrum
Conductor
primary hosts plus the IBM Spectrum
Conductor
primary candidates plus one first
management host (the additional one accounts for the first management host that is used for running
instance group services).
- conductor_primary_host is the IBM Spectrum
Conductor
primary host.
- conductor_primary_candidate_1 … conductor_primary_candidate_k are the primary candidate hosts for a high
availability configuration, where conductor_primary_candidate_1 … conductor_primary_candidate_k is a space
separated list of host names where the hosts are the IBM Spectrum
Conductor
primary candidate hosts. This list
must include one or more hosts.
- conductor_first_management_host is a first host assigned as an IBM Spectrum
Conductor management host to run
instance group services.
- conductor_management_queue is the LSF queue name for the IBM Spectrum
Conductor management cluster.
- conductor_management_app is the LSF application profile name for the
IBM Spectrum
Conductor management
cluster.
- name_of_conductor_management_cluster_main_job is the name of the IBM Spectrum
Conductor management cluster main
job.
- path_and_file_for_job_output is the path and file name of the LSF job output.
- $EGO_TOP/conductorspark/2.5.0/lsf/ConductorSparkCtrlJob.sh is the script that is submitted as a long running
LSF job. The script
controls the startup and termination of the IBM Spectrum
Conductor cluster on the hosts allocated
from LSF. The script
receives parameters that are specified next. The parameters must be provided to the script in the
order that they are specified in the following.
- $EGO_TOP is the top directory of the IBM Spectrum
Conductor installation.
- $LSF_ENVIR is the
LSF configuration
directory. This is the /conf directory under the LSF top installation directory, and
the value of the environment variable $LSF_ENVDIR that is defined after
sourcing the LSF
environment file.
- admin_user and admin_password are the IBM Spectrum
Conductor cluster administrator user
name and password for logging into IBM Spectrum
Conductor resource orchestrator and
start and terminate the IBM Spectrum
Conductor management cluster. The default cluster administrator name and password are Admin /
Admin, and any other defined IBM Spectrum
Conductor cluster administrator user
name and password can be used.
- conductor_management_hostgroup is the host group configured in LSF to include all the hosts in the
LSF cluster that can be
potentially included in the IBM Spectrum
Conductor management cluster to run
instance group management
services.
- N | Y parameter after the
conductor_management_hostgroup parameter in the bsub method
or after the
path_for_logging parameter in the independent script method, is the
Conductor high availability mode parameter, where Y
indicates high availability is configured and N indicates that high
availability is not configured.
- acquire_primary_hosts_indication
indicates whether to acquire the primary hosts exclusively from LSF. Values are
Y or N, where Y indicates to
acquire the primary hosts
exclusively from LSF, and
N indicates not to acquire these hosts from LSF. These hosts are acquired
exclusively from LSF using
a special job. When terminating the IBM Spectrum
Conductor cluster, these hosts are
released. Use the Y option if the primary hosts are included in your
LSF cluster, and use the
N option if the primary hosts are not included in your
LSF cluster.
- path_for_logging is
either a path or the value N. A path indicates to activate logging and
specifies the directory into which the log file of the script will be written.
N indicates to disable the logging of the script.
- keep_log_indication indicates whether to keep the log file generated by the
script after the IBM Spectrum
Conductor
cluster is terminated. Values are Y or N, where
Y indicates that the generated log file is saved, and
N or any other value (including a missing value), indicates that the log file
is deleted.
For the bsub method of starting a cluster, the log name is
ConductorCtrlLog.job_number, and it resides in the output
directory (if provided) designated for the IBM Spectrum
Conductor management cluster main job,
or the current working directory, or the /tmp/ directory on the host running
the IBM Spectrum
Conductor management
cluster main job.
For the independent
script method of starting a cluster, the log name is
ConductorLsfStartLog.pid, and it resides in the directory
specified by the path_for_logging parameter value.
Note:
- The integration supports a single IBM Spectrum
Conductor cluster running within an
LSF cluster.
- For the minimum number of hosts in the IBM Spectrum
Conductor cluster:
- If IBM Spectrum
Conductor is not
configured with primary host high
availability and failover, the minimum number of hosts in the cluster is three. This includes a
primary host, a management host, and
a compute host. The primary host is
specified in either the bsub command or
the ConductorSparkLsfStart.sh script. The management host is included in the
LSF host group
conductor_management_hostgroup.
- If IBM Spectrum
Conductor is configured
with primary host high availability
and failover, the minimum number of hosts in the cluster is five. This includes a primary host, a primary candidate host, a first management
host, an additional management host, and a compute host. The primary host, primary candidate host, and the first
management host are specified in either the bsub command
or the
ConductorSparkLsfStart.sh script. The additional management host is included in
the LSF host group
conductor_management_hostgroup.
Important: The integration supports a single IBM Spectrum
Conductor cluster running within an LSF cluster.
- Create a deployment folder for instance groups, and make sure LSF users to be used for IBM Spectrum
Conductor
instance groups have read, write, and
execute permissions for the deployment folder.
What to do next
Once you have deployed your IBM Spectrum
Conductor cluster within an IBM Spectrum LSF cluster, you can complete any number
of the following tasks:
- Add
or remove potential hosts that the IBM Spectrum
Conductor management cluster can use for
running instance group management
services. See Add or remove potential hosts from the IBM Spectrum Conductor cluster running within an LSF cluster.
- Add
or remove Conductor primary
candidate hosts. See Add or remove IBM Spectrum Conductor primary candidate hosts.
- Create an instance group. See Creating an instance group for an IBM Spectrum Conductor cluster running within an IBM Spectrum LSF cluster.
- Terminate an IBM Spectrum
Conductor cluster that is running within
an LSF cluster by either
running the bkill command
or using an independent script. Use the
same termination method as you did to start the cluster:
- If you
used the bsub method to start the IBM Spectrum
Conductor cluster, run the
bkill command to terminate it:
bkill Conductor_cluster_main_job_number
For example:
bkill 50, which terminates the IBM Spectrum
Conductor management cluster that has a
job number of 50.
If you changed the admin password of
the admin user specified in the bsub command to start the IBM Spectrum
Conductor management cluster after
starting the IBM Spectrum
Conductor
management cluster, you cannot use the bkill command alone to stop the cluster.
Instead, you must log in to the
IBM Spectrum
Conductor cluster as the admin user to
stop all
IBM Spectrum
Conductor services,
then shutdown the cluster, and then issue a
bkill command. Run the following
commands in order:
source $EGO_TOP/profile.platform
egoshutdown.sh
source $LSF_ENVDIR/profile.lsf
bkill Conductor_cluster_main_job_number
- If you
used the ConductorSparkLsfStart.sh script to start the IBM Spectrum
Conductor cluster, run the following
command to terminate
it:
$EGO_TOP/conductorspark/2.5.0/lsf/ConductorSparkLsfTerm.sh $EGO_TOP admin_user admin_password $LSF_ENVDIR path_for_logging
where
the parameters are defined in step 12.The log
name of the ConductorSparkLsfTerm.sh script is
ConductorLsfTermLog.pid, and it resides in the directory
specified by the path_for_logging parameter value.
- Revert IBM Spectrum
Conductor from LSF mode to standard mode standard
mode (in standard mode, the IBM Spectrum
Conductor cluster is not connected to an
LSF cluster):
- Terminate the IBM Spectrum
Conductor
cluster using either the bkill command (if the cluster was started using the
bsub command) or the ConductorSparkLsfTerm.sh script (if the
cluster was started using the ConductorSparkLsfStart.sh script).
- Run the following command to revert the IBM Spectrum
Conductor cluster to standard mode:
where the parameters are defined in step 12.
The log name of the ConductorSparkRevert.sh script is
ConductorRevertLog.pid, and it resides in the directory
specified by the path_for_logging parameter value.
- Start the IBM Spectrum
Conductor
cluster:
egosh ego start
- Create instance groups in IBM Spectrum
Conductor standard mode.
Note: When reverting
IBM Spectrum
Conductor from
LSF mode to standard mode,
instance groups and configuration created in
LSF mode are not supported
in
IBM Spectrum
Conductor standard mode and
must be removed. You must manually port any data stored in these
instance groups.
When switching IBM Spectrum
Conductor from standard mode to LSF mode, instance groups and configuration created in
the standard mode are not supported in LSF mode and must be removed. You
must manually port any data stored in these instance groups.