Administering IBM
Spectrum LSF
parallel workload
Learn how to submit, monitor, and control parallel workload in your LSF cluster.
Configure scheduling policies that reserve resources to enable efficient execution of large parallel
jobs.
Running parallel jobs
LSF provides a generic interface to parallel programming packages so that any parallel package can be supported by writing shell scripts or wrapper programs.
Advance reservation
Advance reservations ensure access to specific hosts or slots during specified times. During the time that an advance reservation is active only users or groups associated with the reservation have access to start new jobs on the reserved hosts or slots.
Fair share scheduling
fair share scheduling divides the processing power of the LSF cluster among users and queues to provide fair access to resources, so that no user or queue can monopolize the resources of the cluster and no queue will be starved.
Job checkpoint and restart
Optimize resource usage with job checkpoint and restart to stop jobs and then restart them from the point at which they stopped.
Job migration for checkpoint-able and re-runnable jobs
Use job migration to move checkpoint-able and re-runnable jobs from one host to another. Job migration makes use of job checkpoint and restart so that a migrated checkpoint-able job restarts on the new host from the point at which the job stopped on the original host.
Re-sizable jobs
Re-sizable jobs can use the number of tasks that are available at any time and can grow or shrink during the job run time by requesting extra tasks if required or release tasks that are no longer needed.