Running parallel jobs
LSF
provides a generic interface to parallel programming packages so that any parallel package can be
supported by writing shell scripts or wrapper programs.
How LSF runs parallel jobs
Preparing your environment to submit parallel jobs to LSF
Submit a parallel job
Start parallel tasks with LSF utilities
For simple parallel jobs you can use LSF utilities to start parts of the job on other hosts. Because LSF utilities handle signals transparently, LSF can suspend and resume all components of your job without additional programming.
Job slot limits for parallel jobs
Specify a minimum and maximum number of tasks
Restrict job size requested by parallel jobs
Specifying a list of allowed job sizes (number of tasks) in queues or application profiles enables LSF to check the requested job sizes when submitting, modifying, or switching jobs.
About specifying a first execution host
Compute units
Compute units are similar to host groups, with the added feature of granularity allowing the construction of clusterwide structures that mimic network architecture. Job scheduling using compute unit resource requirements optimizes job placement based on the underlying system architecture, minimizing communications bottlenecks. Compute units are especially useful when running communication-intensive parallel jobs spanning several hosts. Compute units encode cluster network topology for jobs with a lot of communication between processes. For example, compute units can help minimize network latency and take advantage of fast interconnects by placing all job tasks in the same rack instead of making several network hops.
Control processor allocation across hosts
Run parallel processes on homogeneous hosts
Limit the number of processors allocated
Limit the number of allocated hosts
Reserve processors
Reserve memory for pending parallel jobs
Backfill scheduling
How deadline constraint scheduling works for parallel jobs
Optimized preemption of parallel jobs
Controlling CPU and memory affinity
IBM® Spectrum LSF can schedule jobs that are affinity aware. This allows jobs to take advantage of different levels of processing units (NUMA nodes, sockets, cores, and threads). Affinity scheduling is supported only on Linux and Power 7 and Power 8 hosts. Affinity scheduling is supported in LSF Standard Edition and LSF Advanced Edition. Affinity scheduling is not supported on LSF Express Edition.
Processor binding for LSF job processes
Processor binding for LSF job processes takes advantage of the power of multiple processors and multiple cores to provide hard processor binding functionality for sequential LSF jobs and parallel jobs that run on a single host.
Running parallel jobs with blaunch
Learn how to configure and use the blaunch command for launching parallel and distributed applications within LSF . Task geometry allows for flexibility in how tasks are grouped for execution on system nodes. A typical LSF parallel job launches its tasks across multiple hosts. By default you can enforce limits on the total resources used by all the tasks in the job.
Running MPI workload through IBM Parallel Environment Runtime Edition
IBM Spectrum LSF integrates with the IBM Parallel Environment Runtime Edition (IBM PE Runtime Edition) program product - Version 1.3 or later to run PE jobs through the IBM Parallel Operating Environment (POE). The integration enables network-aware scheduling, allowing an LSF job to specify network resource requirements, collect network information, and schedule the job according to the requested network resources.