IBM Spectrum LSF overview

IBM Spectrum Conductor integrates with IBM Spectrum LSF (LSF) to deploy an IBM Spectrum Conductor cluster and workloads as LSF jobs within an LSF cluster.

Highlights

With this integration, IBM Spectrum Conductor workloads are run as LSF jobs in the LSF cluster so that IBM Spectrum Conductor workloads share resources on a fine grain level with other workloads that are running in the LSF cluster.

You can start and terminate an IBM Spectrum Conductor cluster that runs within an IBM Spectrum LSF cluster, by using either the LSF bsub and bkill commands correspondingly, or using dedicated cluster start and stop scripts.

After you deploy an IBM Spectrum Conductor cluster to run within an LSF cluster, the IBM Spectrum Conductor cluster can be accessed by using the cluster management console, CLI commands, and RESTful APIs.

The key highlights of this integration include:
  • Fine grain resource sharing across IBM Spectrum Conductor and LSF workload.
  • Resource planning, management, and monitoring is done via LSF for IBM Spectrum Conductor.
  • IBM Spectrum Conductor workload run as LSF jobs, under the execution users specified for the IBM Spectrum Conductor workload. The workloads are submitted by using a combination of optional LSF parameters, including: queue, application profile, host group, resource requirement and project name. LSF parameters are specified for the IBM Spectrum Conductor workload.
  • LSF job groups are created for IBM Spectrum Conductor applications under their corresponding execution users. Each LSF job group corresponds to an IBM Spectrum Conductor application.
  • The cluster management console for IBM Spectrum Conductor supports the LSF environment.

How it works

Starting an IBM Spectrum Conductor cluster in an IBM Spectrum LSF cluster starts an IBM Spectrum Conductor management cluster within the IBM Spectrum LSF cluster.

The management cluster runs IBM Spectrum Conductor management services, instance groups and framework management services. The management cluster supports automatic and dynamic cluster resizing, namely growth and reduction, for running more or less instance groups and frameworks. In the management cluster, hosts are acquired exclusively from LSF, and the resource orchestrator (EGO) is used for running the services in this cluster. The resource orchestrator (EGO) is activated on each new host that is acquired for this cluster by using an LSF job that starts EGO on the new host.

IBM Spectrum Conductor workloads run on LSF compute hosts as LSF jobs. This is depicted in the diagram as the IBM Spectrum Conductor workload cluster. In this cluster, the compute hosts are shared with other workloads using fine grain resource sharing. Architecture of how IBM Spectrum LSF integrates with IBM Spectrum Conductor.

Key LSF concepts

Job
A unit of work that is running in the LSF system. A job is a command that is submitted to LSF for execution. LSF schedules, controls, and tracks the job according to configured policies. Jobs can be complex problems, simulation scenarios, extensive calculations, or anything that needs compute power.
LSF bsub command
You can use the LSF bsub command to start an IBM Spectrum Conductor cluster by submitting an IBM Spectrum Conductor cluster controller job, which controls the start and termination of the IBM Spectrum Conductor cluster in the LSF cluster.
LSF bkill command
An IBM Spectrum Conductor cluster can be terminated by using the LSF bkill command, specifying the job number assigned by LSF for the IBM Spectrum Conductor cluster controller job. This activates the termination procedure in the IBM Spectrum Conductor cluster controller, which terminates the cluster.
Application profile
Defines common parameters for the same type of jobs, including the execution requirements of the applications, the resources they require, and how they are run and managed. Operates in conjunction with queue and job-level options. In general, you use application profile definitions to refine queue-level settings, or to exclude some jobs from queue-level parameters.
Queue
A cluster-wide container for jobs. All jobs wait in queues until they are scheduled and dispatched to hosts. Queues do not correspond to individual hosts; each queue can use all server hosts in the cluster, or a configured subset of the server hosts. When you submit a job to a queue, you do not need to specify an execution host. LSF dispatches the job to the best available execution host in the cluster to run that job. Queues implement different job scheduling and control policies.
Host group
Host groups gather similar resources to the same group of hosts (for example, all hosts with big memory). Use host groups to manage dedicated resources for a single organization or to share resources across organizations. You can add limits to host groups, or define host groups in queues to constrain jobs for a scheduling policy that is defined over a specific set of hosts.
Project
A project can be associated with a job. Project names are logged in lsb.acct. You can use the bacct command to gather accounting information on a per-project basis.
Resource requirement string
Most LSF commands accept a -R res_req argument to specify resource requirements. A resource requirement string describes the resources that a job needs. The exact behavior depends on the command. LSF uses resource requirements to select hosts for remote execution and job execution. Resource requirement strings can be simple (applying to the entire job) or compound (applying to the specified number of slots).
Advance reservation
Advance reservations are used in this implementation to allocate resource slots on hosts in the LSF cluster, for IBM Spectrum Conductor workloads. The IBM Spectrum Conductor workloads are started as LSF jobs that reference these advance reservations. Advance reservations are dynamically increased and decreased with resource slots according to workload demand.