Control processor allocation across hosts
Sometimes you need to control how the selected processors for a parallel job are distributed across the hosts in the cluster.
You can control this at the job level or at the queue level. The queue specification is ignored if your job specifies its own locality.
Specify parallel job locality at the job level
By default, LSF does allocate the required processors for the job from the available set of processors.
A parallel job may span multiple hosts, with a specifiable number of processes allocated to each host. A job may be scheduled on to a single multiprocessor host to take advantage of its efficient shared memory, or spread out on to multiple hosts to take advantage of their aggregate memory and swap space. Flexible spanning may also be used to achieve parallel I/O.
You are able to specify “select all the processors for this parallel batch job on the same host”, or “do not choose more than n processors on one host” by using the span section in the resource requirement string (bsub -R or RES_REQ in the queue definition in lsb.queues).
If PARALLEL_SCHED_BY_SLOT=Y in lsb.params, the span string is used to control the number of job slots instead of processors.
Syntax
The span string supports the following syntax:
- span[hosts=1]
Indicates that all the processors allocated to this job must be on the same host.
- span[ptile=value]
Indicates the number of processors on each host that should be allocated to the job, where value is one of the following:
Default ptile value, specified by n processors. In the following example, the job requests 4 processors on each available host, regardless of how many processors the host has:
span[ptile=4]
Predefined ptile value, specified by ’!’. The following example uses the predefined maximum job slot limit lsb.hosts (MXJ per host type/model) as its value:
span[ptile='!']
Tip:If the host or host type/model does not define MXJ, the default predefined ptile value is 1.
Predefined ptile value with optional multiple ptile values, per host type or host model:
For host type, you must specify same[type] in the resource requirement. In the following example, the job requests 8 processors on a host of type HP , and 2 processors on a host of type LINUX, and the predefined maximum job slot limit in lsb.hosts (MXJ) for other host types:
span[ptile='!',HP:8,LINUX:2] same[type]
For host model, you must specify same[model] in the resource requirement. In the following example, the job requests 4 processors on hosts of model PC1133, and 2 processors on hosts of model PC233, and the predefined maximum job slot limit in lsb.hosts (MXJ) for other host models:
span[ptile='!',PC1133:4,PC233:2] same[model]
- span[hosts=-1]
Disables span setting in the queue. LSF allocates the required processors for the job from the available set of processors.
For example,
bsub -q super -R "span[hosts=-1]" -n 5 sleep 180
Specify multiple ptile values
In a span string with multiple ptile values, you must specify a predefined default value (ptile=’!’) and either host model or host type.
You can specify both type and model in the span section in the resource requirement string, but the ptile values must be the same type.
If you specify same[type:model], you cannot specify a predefined ptile value (!) in the span section.
Under bash 3.0, the exclamation mark (!) is not interpreted correctly by the shell. To use predefined ptile value (ptile='!'), use the +H option to disable '!' style history substitution in bash (sh +H).
LINUX and HP are both host types and can appear in the same span string. The following span string is valid:
same[type] span[ptile='!',LINUX:2,HP:4]
PC233 and PC1133 are both host models and can appear in the same span string. The following span string is valid:
same[model] span[ptile='!',PC233:2,PC1133:4]
You cannot mix host model and host type in the same span string. The following span strings are not correct:
span[ptile='!',LINUX:2,PC1133:4] same[model]
span[ptile='!',LINUX:2,PC1133:4] same[type]
The LINUX host type and PC1133 host model cannot appear in the same span string.
Multiple ptile values for a host type
For host type, you must specify same[type] in the resource requirement. For example:
span[ptile='!',HP:8,SOL:8,LINUX:2] same[type]
The job requests 8 processors on a host of type HP or SOL, and 2 processors on a host of type LINUX, and the predefined maximum job slot limit in lsb.hosts (MXJ) for other host types.
Multiple ptile values for a host model
For host model, you must specify same[model] in the resource requirement. For example:
span[ptile='!',PC1133:4,PC233:2] same[model]
The job requests 4 processors on hosts of model PC1133, and 2 processors on hosts of model PC233, and the predefined maximum job slot limit in lsb.hosts (MXJ) for other host models.
Examples
bsub -n 4 -R "span[hosts=1]" myjob
Runs the job on a host that has at least 4 processors currently eligible to run the 4 components of this job.
bsub -n 4 -R "span[ptile=2]" myjob
Runs the job on 2 hosts, using 2 processors on each host. Each host may have more than 2 processors available.
bsub -n 4 -R "span[ptile=3]" myjob
Runs the job on 2 hosts, using 3 processors on the first host and 1 processor on the second host.
bsub -n 4 -R "span[ptile=1]" myjob
Runs the job on 4 hosts, even though some of the 4 hosts may have more than one processor currently available.
bsub -n 4 -R "type==any same[type] span[ptile='!',LINUX:2,HP:4]" myjob
Submits myjob to request 4 processors running on 2 hosts of type LINUX (2 processors per host), or a single host of type HP, or for other host types, the predefined maximum job slot limit in lsb.hosts (MXJ).
bsub -n 16 -R "type==any same[type] span[ptile='!',HP:8,SOL:8,LINUX:2]" myjob
Submits myjob to request 16 processors on 2 hosts of type HP or SOL (8 processors per hosts), or on 8 hosts of type LINUX (2 processors per host), or the predefined maximum job slot limit in lsb.hosts (MXJ) for other host types.
bsub -n 4 -R "same[model] span[ptile='!',PC1133:4,PC233:2]" myjob
Submits myjob to request a single host of model PC1133 (4 processors), or 2 hosts of model PC233 (2 processors per host), or the predefined maximum job slot limit in lsb.hosts (MXJ) for other host models.
Specify parallel job locality at the queue level
The queue may also define the locality for parallel jobs using the RES_REQ parameter.