Clusters, jobs, and queues
A group of computers (hosts) running LSF that work together as a single unit, combining computing power and sharing workload and resources. A cluster provides a single-system image for a network of computing resources.
Hosts can be grouped into clusters in a number of ways. A cluster could contain:
All the hosts in a single administrative group
All the hosts on one file server or sub-network
Hosts that perform similar functions
lshosts — View static resource information about hosts in the cluster
bhosts — View resource and job information about server hosts in the cluster
lsid — View the cluster name
lsclusters — View cluster status and size
Define hosts in your cluster in lsf.cluster.cluster_nameTip:
The name of your cluster should be unique. It should not be the same as any host or queue.
A unit of work run in the LSF system. A job is a command submitted to LSF for execution. LSF schedules, controls, and tracks the job according to configured policies.
Jobs can be complex problems, simulation scenarios, extensive calculations, anything that needs compute power.
bjobs — View jobs in the system
bsub — Submit jobs
A job slot is a bucket into which a single unit of work is assigned in the LSF system. Hosts are configured to have a number of job slots available and queues dispatch jobs to fill job slots.
bhosts — View job slot limits for hosts and host groups
bqueues — View job slot limits for queues
busers — View job slot limits for users and user groups
Define job slot limits in lsb.resources.
LSF jobs have the following states:
PEND — Waiting in a queue for scheduling and dispatch
RUN — Dispatched to a host and running
DONE — Finished normally with zero exit value
EXIT — Finished with non-zero exit value
PSUSP — Suspended while pending
USUSP — Suspended by user
SSUSP — Suspended by the LSF system
POST_DONE — Post-processing completed without errors
POST_ERR — Post-processing completed with errors
WAIT — Members of a chunk job that are waiting to run
A clusterwide container for jobs. All jobs wait in queues until they are scheduled and dispatched to hosts.
Queues do not correspond to individual hosts; each queue can use all server hosts in the cluster, or a configured subset of the server hosts.
When you submit a job to a queue, you do not need to specify an execution host. LSF dispatches the job to the best available execution host in the cluster to run that job.
Queues implement different job scheduling and control policies.
bqueues — View available queues
bsub -q — Submit a job to a specific queue
bparams — View default queues
Define queues in lsb.queuesTip:
The names of your queues should be unique. They should not be the same as the cluster name or any host in the cluster.
First-come, first-served scheduling (FCFS)
The default type of scheduling in LSF. Jobs are considered for dispatch based on their order in the queue.