Deploying an IBM Spectrum Conductor cluster within an IBM Spectrum LSF cluster

You can deploy an IBM Spectrum Conductor cluster and workloads as LSF® jobs within an IBM Spectrum LSF (LSF) cluster.

About this task

To deploy IBM Spectrum Conductor within LSF, you must create an LSF batch queue, application profile, and host group for the IBM Spectrum Conductor management cluster.
Note: These LSF application profile, queue, and host group are for use only by the IBM Spectrum Conductor management cluster to acquired additional management hosts, and cannot be used for workloads. This host group must include at least one host. The hosts in this host group must be different than the primary host of IBM Spectrum Conductor and any primary candidate hosts (if IBM Spectrum Conductor is configured with primary high availability and failover). For workloads there must be at least one additional host available beyond the hosts specified previously.

Procedure

  1. Install IBM Spectrum LSF or use an existing installation. IBM Spectrum LSF 10.1.0.6 or later is supported.
  2. Create an LSF cluster, or use an existing cluster.
  3. Install IBM Spectrum Conductor as a shared installation. See Installing IBM Spectrum Conductor to a shared environment.

    When you install to a shared environment, you install IBM Spectrum Conductor once on a shared file system that every host in the cluster shares and to which every host has access. These hosts can be any subset of the hosts that are included in the LSF cluster. These hosts must be of the types that are supported by IBM Spectrum Conductor.

    If your LSF cluster has root job submission disabled, then apply the root squash setting (including the ROOT_SQUASH_INSTALL environment variable) when installing IBM Spectrum Conductor (see step 4 in Installing IBM Spectrum Conductor to a shared environment). In this case the IBM Spectrum Conductor cluster administrator user must be a non root user.

  4. Configure the IBM Spectrum Conductor cluster.
    1. Log in to the host that is selected as the IBM Spectrum Conductor primary host and, as the IBM Spectrum Conductor cluster administrator, run the following commands:
      1. Source the environment.
        • If you are using BASH, run:
          source install_directory/profile.platform
        • If you are using CSH, run:
          source install_directory/cshrc.platform
        where install_directory is the installation directory.
      2. Add the primary host.
        egoconfig join primary_host_name
        where primary_host_name is the primary host name.
      3. Set entitlement.
        egoconfig setentitlement entitlement_file_path
        where entitlement_file_path is the full path of the entitlement file.
    2. To support IBM Spectrum Conductor primary host high availability and failover apply the following steps:
      1. Designate one host to be the IBM Spectrum Conductor primary host, at least one additional host to be the IBM Spectrum Conductor primary candidates, and at least one additional host to run instance group management services. In this configuration, instance group management services do not run on the primary host and primary candidate hosts.
      2. Log in to the host that is designated as the IBM Spectrum Conductor cluster primary host and to each of the additional hosts designated as the IBM Spectrum Conductor primary candidates, as the IBM Spectrum Conductor cluster administrator, and run the following commands on these hosts:
        1. Source the environment.
          • If you are using BASH, run:
            source installation_directory/profile.platform
          • If you are using CSH, run:
            source installation_directory/cshrc.platform
          where installation_directory is the installation directory.
        2. Configure the shared directory for the primary host and the primary candidate hosts.
          egoconfig mghost shared_directory
          where shared_directory is a directory in the shared file system that is accessible to the primary host and the primary candidate hosts and will be used for high availability data.
          Important: Do not run the egoconfig mghost shared_directory command on the hosts designated to run instance group management services.
        For additional information on setting up primary host high availability and failover, see Setting up primary host failover.
      3. Enable secure shell between all the following hosts: primary host, primary candidate hosts, and the hosts for running instance group management services. For details on how to enable secure shell between these hosts, see Enabling secure shell.
  5. Log in as the LSF administrator on any host in the LSF cluster.
  6. Create an LSF batch queue that is used for acquiring hosts, in an exclusive mode, for the IBM Spectrum Conductor management cluster.
    1. Open the lsb.queues configuration file for your LSF cluster. By default, the file is in the $LSF_ENVDIR/lsbatch/cluster_name/configdir directory. For more information on the lsb.queues configuration file, see lsb.queues.
    2. Add a queue for the IBM Spectrum Conductor management cluster. The queue definition must include the parameter EXCLUSIVE = Y.
      For example:
      Begin Queue
      QUEUE_NAME = conductor_management_queue
      EXCLUSIVE  = Y
      End Queue
      
    3. Save your changes.
  7. Create an LSF application profile for the IBM Spectrum Conductor management cluster. The application profile references the cluster termination script residing in $EGO_TOP/conductorspark/2.5.0/lsf/ConductorSparkCtrlTerm.sh.
    Note: The cluster termination script referenced by the LSF application profile for the IBM Spectrum Conductor management cluster must reside in the directory $EGO_TOP/conductorspark/2.5.0/lsf and cannot be copied to another directory.
    1. Open the lsb.applications configuration file for your LSF cluster. By default, the file is in the $LSF_ENVDIR/lsbatch/cluster_name/configdir. For more information on the lsb.applications configuration file, see lsb.applications.
    2. Add a new application profile definition for the IBM Spectrum Conductor management cluster. The application profile definition must include the parameter TERMINATE_CONTROL that references ConductorSparkCtrlTerm.sh and must include the parameter RESIZABLE_JOBS=Y. For example:
      Begin Application
      Name=conductor_management_app
      RESIZABLE_JOBS=Y
      TERMINATE_CONTROL=$EGO_TOP/conductorspark/2.5.0/lsf/ConductorSparkCtrlTerm.sh
      End Application
    3. Save the changes.
  8. Create an LSF host group for the IBM Spectrum Conductor management cluster:
    1. Open the lsb.hosts configuration file for your LSF cluster. By default, the file is in the $LSF_ENVDIR/lsbatch/cluster_name/configdir. For more information on the lsb.hosts configuration file, see lsb.hosts.
    2. Add a host group that includes all the hosts in the LSF cluster that can be potentially included in the IBM Spectrum Conductor management cluster to run instance group management services.
      Note: In this host group, do not include the LSF hosts designated as the IBM Spectrum Conductor primary host, the IBM Spectrum Conductor primary candidate hosts (in the case of high availability), and any host that may be used for running workloads. This host group must include at least one host.

      Enter a group name under the GROUP_NAME column, and enter a list of hosts, space separated and within brackets, that will be used to acquire hosts for the IBM Spectrum Conductor management cluster.

      For example:
      Begin HostGroup
      GROUP_NAME    GROUP_MEMBER #GROUP_ADMIN # Key words
      conductor_management_hostgroup     (hostname1 hostname2)
      End HostGroup
      where conductor_management_hostgroup is the host group used for the IBM Spectrum Conductor cluster.
    3. Save your changes.
  9. Edit the lsf.conf configuration file for your LSF cluster. By default, the file is in the $LSF_ENVDIR directory. For more information on the lsb.conf configuration file, see lsb.conf.

    Enable allocation and usage of GPU resources by IBM Spectrum Conductor workload, by ensuring that the following settings are applied in the lsf.conf configuration file of your LSF cluster, or otherwise apply the following settings:

    1. Open the lsf.conf configuration file for your LSF cluster.
    2. Enter the following parameters and values or ensure that these parameters and values already exist in the file: LSB_GPU_NEW_SYNTAX=extend and LSF_GPU_AUTOCONFIG=Y. These parameters are set for enabling GPU functionality in your LSF cluster. To learn more about enabling GPU functionality, see Enabling GPU features in LSF.
  10. Verify and apply the LSF configuration.
    1. Check the LSF configuration using the lsadmin command, as follows:
      lsadmin ckconfig
      If any errors are reported, fix the problem and check the configuration again.
    2. Apply the new LSF configuration, as follows:
      lsadmin reconfig
    3. Check the LSF configuration using the badmin command, as follows:
      badmin ckconfig
      If any errors are reported, fix the problem and check the configuration again.
    4. Apply the new LSF configuration, as follows:
      badmin mbdrestart
  11. Ensure that the LSF cluster is running and ensure that the following conditions are satisfied:
    • The clocks of all the LSF hosts to be used for the IBM Spectrum Conductor management cluster and used as compute hosts for IBM Spectrum Conductor workload must be synchronized to an identical time.
    • The IBM Spectrum Conductor installation top directory, the directories used for instance group deployment, and any further directories on which the IBM Spectrum Conductor workload relies on, all reside in shared storage that is accessible to all the LSF hosts that will be used as IBM Spectrum Conductor management cluster hosts and compute hosts.
    • If IBM Spectrum Conductor is not configured with primary host high availability and failover, ensure that the LSF host designated as the IBM Spectrum Conductor primary host is available to run an exclusive LSF job that will start the IBM Spectrum Conductor management cluster. This host cannot run any other LSF jobs.
    • If IBM Spectrum Conductor is configured with primary host high availability and failover, ensure that all the LSF hosts designated as the IBM Spectrum Conductor primary host, the IBM Spectrum Conductor primary candidates, and the IBM Spectrum Conductor management host for instance groups are all available to run exclusive LSF jobs that will start the IBM Spectrum Conductor management cluster. These hosts cannot run any other LSF jobs.
  12. Start the IBM Spectrum Conductor management cluster using the LSF bsub command, or using an independent script.
    Note: The scripts used for starting the IBM Spectrum Conductor management cluster, either referenced by the LSF bsub command or invoked independently, must reside in the directory $EGO_TOP/conductorspark/2.5.0/lsf and cannot be copied to another directory.
    • To start the IBM Spectrum Conductor management cluster using the LSF bsub command:
      • If IBM Spectrum Conductor is not configured with primary host high availability and failover:
        bsub -x -n 1 -R "span[ptile=1]" -mconductor_primary_host" -q conductor_management_queue -app conductor_management_app -J name_of_conductor_management_cluster_main_job -o "path_and_file_for_job_output" $EGO_TOP/conductorspark/2.5.0/lsf/ConductorSparkCtrlJob.sh $EGO_TOP/ admin_user admin_password conductor_management_hostgroup N keep_log_indication
      • If IBM Spectrum Conductor is configured with primary host high availability and failover:
        bsub -x -n number_of_primary_and_candidate_hosts_plus_one -R "span[ptile=1]" -mconductor_primary_host! conductor_primary_candidate_1 … conductor_primary_candidate_k conductor_first_management_host" -q conductor_management_queue -app conductor_management_app -J name_of_conductor_management_cluster_main_job -o "path_and_file_for_job_output" $EGO_TOP/conductorspark/2.5.0/lsf/ConductorSparkCtrlJob.sh $EGO_TOP/ admin_user admin_password conductor_management_hostgroup Y keep_log_indication
    • To start the IBM Spectrum Conductor management cluster using an independent start script, log into the host that is selected as the IBM Spectrum Conductor cluster primary host and, as the IBM Spectrum Conductor cluster administrator, run the following command:
      • If IBM Spectrum Conductor is not configured with primary host high availability and failover:
        $EGO_TOP/conductorspark/2.5.0/lsf/ConductorSparkLsfStart.sh $EGO_TOP admin_user admin_password $LSF_ENVDIR conductor_management_hostgroup conductor_management_queue path_for_logging N acquire_primary_hosts_indication conductor_primary_host
      • If IBM Spectrum Conductor is configured with primary high availability and failover:
        $EGO_TOP/conductorspark/2.5.0/lsf/ConductorSparkLsfStart.sh $EGO_TOP admin_user admin_password $LSF_ENVDIR conductor_management_hostgroup conductor_management_queue path_for_logging Y acquire_primary_hosts_indication conductor_primary_host conductor_primary_candidate_1 … conductor_primary_candidate_k conductor_first_management_host
    where:
    • number_of_primary_and_candidate_hosts_plus_one is the total number of hosts comprised of the IBM Spectrum Conductor primary hosts plus the IBM Spectrum Conductor primary candidates plus one first management host (the additional one accounts for the first management host that is used for running instance group services).
    • conductor_primary_host is the IBM Spectrum Conductor primary host.
    • conductor_primary_candidate_1 … conductor_primary_candidate_k are the primary candidate hosts for a high availability configuration, where conductor_primary_candidate_1 … conductor_primary_candidate_k is a space separated list of host names where the hosts are the IBM Spectrum Conductor primary candidate hosts. This list must include one or more hosts.
    • conductor_first_management_host is a first host assigned as an IBM Spectrum Conductor management host to run instance group services.
    • conductor_management_queue is the LSF queue name for the IBM Spectrum Conductor management cluster.
    • conductor_management_app is the LSF application profile name for the IBM Spectrum Conductor management cluster.
    • name_of_conductor_management_cluster_main_job is the name of the IBM Spectrum Conductor management cluster main job.
    • path_and_file_for_job_output is the path and file name of the LSF job output.
    • $EGO_TOP/conductorspark/2.5.0/lsf/ConductorSparkCtrlJob.sh is the script that is submitted as a long running LSF job. The script controls the startup and termination of the IBM Spectrum Conductor cluster on the hosts allocated from LSF. The script receives parameters that are specified next. The parameters must be provided to the script in the order that they are specified in the following.
    • $EGO_TOP is the top directory of the IBM Spectrum Conductor installation.
    • $LSF_ENVIR is the LSF configuration directory. This is the /conf directory under the LSF top installation directory, and the value of the environment variable $LSF_ENVDIR that is defined after sourcing the LSF environment file.
    • admin_user and admin_password are the IBM Spectrum Conductor cluster administrator user name and password for logging into IBM Spectrum Conductor resource orchestrator and start and terminate the IBM Spectrum Conductor management cluster. The default cluster administrator name and password are Admin / Admin, and any other defined IBM Spectrum Conductor cluster administrator user name and password can be used.
    • conductor_management_hostgroup is the host group configured in LSF to include all the hosts in the LSF cluster that can be potentially included in the IBM Spectrum Conductor management cluster to run instance group management services.
    • N | Y parameter after the conductor_management_hostgroup parameter in the bsub method or after the path_for_logging parameter in the independent script method, is the Conductor high availability mode parameter, where Y indicates high availability is configured and N indicates that high availability is not configured.
    • acquire_primary_hosts_indication indicates whether to acquire the primary hosts exclusively from LSF. Values are Y or N, where Y indicates to acquire the primary hosts exclusively from LSF, and N indicates not to acquire these hosts from LSF. These hosts are acquired exclusively from LSF using a special job. When terminating the IBM Spectrum Conductor cluster, these hosts are released. Use the Y option if the primary hosts are included in your LSF cluster, and use the N option if the primary hosts are not included in your LSF cluster.
    • path_for_logging is either a path or the value N. A path indicates to activate logging and specifies the directory into which the log file of the script will be written. N indicates to disable the logging of the script.
    • keep_log_indication indicates whether to keep the log file generated by the script after the IBM Spectrum Conductor cluster is terminated. Values are Y or N, where Y indicates that the generated log file is saved, and N or any other value (including a missing value), indicates that the log file is deleted.

      For the bsub method of starting a cluster, the log name is ConductorCtrlLog.job_number, and it resides in the output directory (if provided) designated for the IBM Spectrum Conductor management cluster main job, or the current working directory, or the /tmp/ directory on the host running the IBM Spectrum Conductor management cluster main job.

      For the independent script method of starting a cluster, the log name is ConductorLsfStartLog.pid, and it resides in the directory specified by the path_for_logging parameter value.

    Note:
    • The integration supports a single IBM Spectrum Conductor cluster running within an LSF cluster.
    • For the minimum number of hosts in the IBM Spectrum Conductor cluster:
      • If IBM Spectrum Conductor is not configured with primary host high availability and failover, the minimum number of hosts in the cluster is three. This includes a primary host, a management host, and a compute host. The primary host is specified in either the bsub command or the ConductorSparkLsfStart.sh script. The management host is included in the LSF host group conductor_management_hostgroup.
      • If IBM Spectrum Conductor is configured with primary host high availability and failover, the minimum number of hosts in the cluster is five. This includes a primary host, a primary candidate host, a first management host, an additional management host, and a compute host. The primary host, primary candidate host, and the first management host are specified in either the bsub command or the ConductorSparkLsfStart.sh script. The additional management host is included in the LSF host group conductor_management_hostgroup.
    Important: The integration supports a single IBM Spectrum Conductor cluster running within an LSF cluster.
  13. Create a deployment folder for instance groups, and make sure LSF users to be used for IBM Spectrum Conductor instance groups have read, write, and execute permissions for the deployment folder.

What to do next

Once you have deployed your IBM Spectrum Conductor cluster within an IBM Spectrum LSF cluster, you can complete any number of the following tasks:
  • Add or remove potential hosts that the IBM Spectrum Conductor management cluster can use for running instance group management services. See Add or remove potential hosts from the IBM Spectrum Conductor cluster running within an LSF cluster.
  • Add or remove Conductor primary candidate hosts. See Add or remove IBM Spectrum Conductor primary candidate hosts.
  • Create an instance group. See Creating an instance group for an IBM Spectrum Conductor cluster running within an IBM Spectrum LSF cluster.
  • Terminate an IBM Spectrum Conductor cluster that is running within an LSF cluster by either running the bkill command or using an independent script. Use the same termination method as you did to start the cluster:
    • If you used the bsub method to start the IBM Spectrum Conductor cluster, run the bkill command to terminate it:
      bkill Conductor_cluster_main_job_number

      For example: bkill 50, which terminates the IBM Spectrum Conductor management cluster that has a job number of 50.

      If you changed the admin password of the admin user specified in the bsub command to start the IBM Spectrum Conductor management cluster after starting the IBM Spectrum Conductor management cluster, you cannot use the bkill command alone to stop the cluster.

      Instead, you must log in to the IBM Spectrum Conductor cluster as the admin user to stop all IBM Spectrum Conductor services, then shutdown the cluster, and then issue a bkill command. Run the following commands in order:
      source $EGO_TOP/profile.platform
      egoshutdown.sh
      source $LSF_ENVDIR/profile.lsf
      bkill Conductor_cluster_main_job_number
    • If you used the ConductorSparkLsfStart.sh script to start the IBM Spectrum Conductor cluster, run the following command to terminate it:
      $EGO_TOP/conductorspark/2.5.0/lsf/ConductorSparkLsfTerm.sh $EGO_TOP admin_user admin_password $LSF_ENVDIR path_for_logging
      where the parameters are defined in step 12.

      The log name of the ConductorSparkLsfTerm.sh script is ConductorLsfTermLog.pid, and it resides in the directory specified by the path_for_logging parameter value.

  • Revert IBM Spectrum Conductor from LSF mode to standard mode standard mode (in standard mode, the IBM Spectrum Conductor cluster is not connected to an LSF cluster):
    1. Terminate the IBM Spectrum Conductor cluster using either the bkill command (if the cluster was started using the bsub command) or the ConductorSparkLsfTerm.sh script (if the cluster was started using the ConductorSparkLsfStart.sh script).
    2. Run the following command to revert the IBM Spectrum Conductor cluster to standard mode:
      • If IBM Spectrum Conductor is not configured with primary host high availability and failover:
        $EGO_TOP/conductorspark/2.5.0/lsf/ConductorSparkRevert.sh $EGO_TOP path_for_logging N
      • If IBM Spectrum Conductor is configured with primary host high availability and failover:
        $EGO_TOP/conductorspark/2.5.0/lsf/ConductorSparkRevert.sh $EGO_TOP path_for_logging Y
      where the parameters are defined in step 12.

      The log name of the ConductorSparkRevert.sh script is ConductorRevertLog.pid, and it resides in the directory specified by the path_for_logging parameter value.

    3. Start the IBM Spectrum Conductor cluster:
      egosh ego start
    4. Create instance groups in IBM Spectrum Conductor standard mode.
    Note: When reverting IBM Spectrum Conductor from LSF mode to standard mode, instance groups and configuration created in LSF mode are not supported in IBM Spectrum Conductor standard mode and must be removed. You must manually port any data stored in these instance groups.

    When switching IBM Spectrum Conductor from standard mode to LSF mode, instance groups and configuration created in the standard mode are not supported in LSF mode and must be removed. You must manually port any data stored in these instance groups.