lsb.queues

The lsb.queues file defines batch queues. Numerous controls are available at the queue level to allow cluster administrators to customize site policies.

This file is optional. If no queues are configured, LSF creates a queue that is named default, with all parameters set to default values.

This file is installed by default in LSB_CONFDIR/cluster_name/configdir.

Changing lsb.queues configuration

After you change lsb.queues, run badmin reconfig to reconfigure mbatchd.

Some parameters, such as run window and runtime limit, do not take effect immediately for running jobs unless you run mbatchd restart or sbatchd restart on the job execution host.

lsb.queues structure

Each queue definition begins with the line Begin Queue and ends with the line End Queue. The queue name must be specified; all other parameters are optional.

#INCLUDE

Syntax

#INCLUDE "path-to-file"

Description

Inserts a configuration setting from another file to the current location. Use this directive to dedicate control of a portion of the configuration to other users or user groups by providing write access for the included file to specific users or user groups, and to ensure consistency of configuration file settings in different clusters (if you are using the LSF multicluster capability).

For more information, see Shared configuration file content.

#INCLUDE can be inserted anywhere in the local configuration file.

Default

Not defined.

ADMINISTRATORS

Specifies a space-separated list of queue administrators. Queue administrators can operate on any user job in the queue, and on the queue itself.

Syntax

ADMINISTRATORS=user_name | user_group ...

Description

To specify a Windows user account or user group, include the domain name in uppercase letters (DOMAIN_NAME\user_name or DOMAIN_NAME\user_group).

Default

Not defined. You must be a cluster administrator to operate on this queue.

APS_PRIORITY

Specifies calculation factors for absolute priority scheduling (APS). Pending jobs in the queue are ordered according to the calculated APS value.

Syntax

APS_PRIORITY=WEIGHT[[factor, value] [subfactor, value]...]...] LIMIT[[factor, value] [subfactor, value]...]...] GRACE_PERIOD[[factor, value] [subfactor, value]...]...]

Description

If weight of a subfactor is defined, but the weight of parent factor is not defined, the parent factor weight is set as 1.

The WEIGHT and LIMIT factors are floating-point values. Specify a value for GRACE_PERIOD in seconds (values), minutes (valuem), or hours (valueh).

The default unit for grace period is hours.

The following are the names of the factors and subfactors to specify:

Factors Subfactors Metric
FS (user-based fairshare factor) The existing fairshare feature tunes the dynamic user priority The fairshare factor automatically adjusts the APS value based on dynamic user priority.

The FAIRSHARE parameter must be defined in the queue. The FS factor is ignored for non-fairshare queues.

The FS factor is influenced by the following fairshare parameters that are defined in the lsb.queues or lsb.params file:

  • CPU_TIME_FACTOR
  • FWD_JOB_FACTOR
  • RUN_TIME_FACTOR
  • RUN_JOB_FACTOR
  • HIST_HOURS
RSRC (resource factors) PROC Requested tasks are the maximum of bsub -n min_task, max_task, the min of bsub -n min, or the value of the TASKLIMIT parameter in the lsb.queues file.
MEM Total real memory requested (in MB or in units set in the LSF_UNIT_FOR_LIMITS parameter in the lsf.conf file).

Memory requests appearing to the right of a || symbol in a usage string are ignored in the APS calculation.

For multi-phase memory reservation, the APS value is based on the first phase of reserved memory.

SWAP Total swap space requested (in MB or in units set in the LSF_UNIT_FOR_LIMITS parameter in the lsf.conf file).

As with MEM, swap space requests appearing to the right of a || symbol in a usage string are ignored.

WORK (job attributes) JPRIORITY The job priority that is specified by:
  • Default that is specified by half of the value of the MAX_USER_PRIORITY parameter in the lsb.params file
  • Users with bsub -sp or bmod -sp
  • Automatic priority escalation with the JOB_PRIORITY_OVER_TIME parameter in the lsb.params file

If the TRACK_ELIGIBLE_PENDINFO parameter in the lsb.params file is set to Y or y, LSF increases the job priority for pending jobs as long as it is eligible for scheduling. LSF does not increase the job priority for ineligible pending jobs.

QPRIORITY The priority of the submission queue.
APP   Set the priority factor at the application profile level by specifying the PRIORITY parameter in the lsb.applications file. The APP_PRIORITY factor is added to the calculated APS value to change the factor value. The APP_PRIORITY factor applies to the entire job.
USER   Set the priority factor for users by specifying the PRIORITY parameter in the User section of the lsb.users file. The USER_PRIORITY factor is added to the calculated APS value to change the factor value. The USER_PRIORITY factor applies to the entire job.
UG   Set the priority factor for user groups by specifying the PRIORITY parameter in the UserGroup section of the lsb.users file. The UG_PRIORITY factor is added to the calculated APS value to change the factor value. The UG_PRIORITY factor applies to the entire job. LSF uses the priority of the user group as specified in the bsub -G option.
ADMIN   Administrators use bmod -aps to set this subfactor value for each job. A positive value increases the APS. A negative value decreases the APS. The ADMIN factor is added to the calculated APS value to change the factor value. The ADMIN factor applies to the entire job. You cannot configure separate weight, limit, or grace period factors. The ADMIN factor takes effect as soon as it is set.
For example, the following sets a grace period of 10 hours for the MEM factor, 10 minutes for the JPRIORITY factor, 10 seconds for the QPRIORITY factor, and 10 hours (default) for the RSRC factor:
GRACE_PERIOD[[MEM,10h] [JPRIORITY, 10m] [QPRIORITY,10s] [RSRC, 10]]

You cannot specify 0 (zero) for the WEIGHT, LIMIT, and GRACE_PERIOD of any factor or subfactor.

APS queues cannot configure cross-queue fairshare (FAIRSHARE_QUEUES). The QUEUE_GROUP parameter replaces FAIRSHARE_QUEUES, which is obsolete in LSF 7.0.

Suspended (bstop) jobs and migrated jobs (bmig) are always scheduled before pending jobs. For migrated jobs, LSF keeps the existing job priority information.

If LSB_REQUEUE_TO_BOTTOM and LSB_MIG2PEND are configured in lsf.conf, the migrated jobs keep their APS information. When LSB_REQUEUE_TO_BOTTOM and LSB_MIG2PEND are configured, the migrated jobs need to compete with other pending jobs based on the APS value. To reset the APS value, use brequeue, not bmig.

Default

Not defined

BACKFILL

Enables backfill scheduling for the queue.

Syntax

BACKFILL=Y | N

Description

Set this parameter as Y to enable backfill scheduling for the queue.

A possible conflict exists if BACKFILL and PREEMPTION are specified together. If PREEMPT_JOBTYPE = BACKFILL is set in the lsb.params file, a backfill queue can be preemptable. Otherwise, a backfill queue cannot be preemptable. If BACKFILL is enabled, do not also specify PREEMPTION = PREEMPTABLE.

BACKFILL is required for interruptible backfill queues (INTERRUPTIBLE_BACKFILL=seconds).

When MAX_SLOTS_IN_POOL, SLOT_RESERVE, and BACKFILL are defined for the same queue, jobs in the queue cannot backfill with slots that are reserved by other jobs in the same queue.

Default

Not defined. No backfilling.

CHKPNT

Enables automatic checkpointing for the queue. All jobs that are submitted to the queue are checkpointable.

Syntax

CHKPNT=chkpnt_dir [chkpnt_period]

Description

The checkpoint directory is the directory where the checkpoint files are created. Specify an absolute path or a path relative to CWD, do not use environment variables.

Specify the optional checkpoint period in minutes.

You can checkpoint only running members of a chunk job.

If checkpoint-related configuration is specified in both the queue and an application profile, the application profile setting overrides queue level configuration.

Checkpoint-related configuration that is specified in the queue, application profile, and at job level has the following effect:
  • Application-level and job-level parameters are merged. If the same parameter is defined at both job-level and in the application profile, the job-level value overrides the application profile value.
  • The merged result of job-level and application profile settings override queue-level configuration.

To enable checkpointing of MultiCluster jobs, define a checkpoint directory in both the send-jobs and receive-jobs queues (CHKPNT in lsb.queues), or in an application profile (CHKPNT_DIR, CHKPNT_PERIOD, CHKPNT_INITPERIOD, CHKPNT_METHOD in lsb.applications) of both submission cluster and execution cluster. LSF uses the directory that is specified in the execution cluster.

To make a MultiCluster job checkpointable, both submission and execution queues must enable checkpointing, and the application profile or queue setting on the execution cluster determines the checkpoint directory. Checkpointing is not supported if a job runs on a leased host.

The file path of the checkpoint directory can contain up to 4000 characters for UNIX and Linux, or up to 255 characters for Windows, including the directory and file name.

Default

Not defined

CHUNK_JOB_SIZE

Enables job chunking and specifies the maximum number of jobs that are allowed to be dispatched together in a chunk.

Syntax

CHUNK_JOB_SIZE=integer

Description

Note: This parameter is deprecated and might be removed in a future version of LSF.

Specify a positive integer greater than 1.

The ideal candidates for job chunking are jobs that have the same host and resource requirements and typically take 1 - 2 minutes to run.

Job chunking can have the following advantages:
  • Reduces communication between sbatchd and mbatchd and reduces scheduling performance in mbschd.
  • Increases job throughput in mbatchd and CPU usage on the execution hosts.

However, throughput can deteriorate if the chunk job size is too large. Performance might decrease on queues with CHUNK_JOB_SIZE greater than 30. Evaluate the chunk job size on your own systems for best performance.

With MultiCluster job forwarding model, this parameter does not affect MultiCluster jobs that are forwarded to a remote cluster.

Compatibility

This parameter is ignored in interactive queues (INTERACTIVE=ONLY parameter).
  • Interactive (INTERACTIVE=ONLY parameter)

If CHUNK_JOB_DURATION is set in lsb.params, chunk jobs are accepted as long as the CPULIMIT, RUNLIMIT, or RUNTIME values do not exceed the CHUNK_JOB_DURATION.

Example

The following example configures a queue that is named chunk, which dispatches up to four jobs in a chunk:
Begin Queue
QUEUE_NAME     = chunk 
PRIORITY       = 50 
CHUNK_JOB_SIZE = 4 
End Queue

Default

Not defined

COMMITTED_RUN_TIME_FACTOR

Specifies the committed runtime weighting factor. Used only with fairshare scheduling.

Syntax

COMMITTED_RUN_TIME_FACTOR=number

Description

In the calculation of a user dynamic priority, this factor determines the relative importance of the committed run time in the calculation. If the -W option of bsub is not specified at job submission and a RUNLIMIT is not set for the queue, the committed run time is not considered.

If undefined, the cluster-wide value from the lsb.params parameter of the same name is used.

Valid values

Any positive number between 0.0 and 1.0

Default

Not defined.

CONTAINER

Syntax

CONTAINER=docker[image(image_name) options(docker_run_options)]

CONTAINER=nvidia-docker[image(image_name) options(docker_run_options)]

CONTAINER=shifter[image(image_name) options(container_options)]

CONTAINER=singularity[image(image_name) options(container_options)]

CONTAINER=enroot[image(image_name) options(enroot_start_options)]

Description

Enables LSF to use a supported container for jobs that are submitted to this queue. This parameter uses the following keywords:

docker | nvidia-docker | shifter | singularity | enroot
Required. Use one of these keywords to specify the type of container to use for jobs that are submitted to this queue. Use docker if you are running Podman containers.
image
Required. This keyword specifies the image name that is used in running jobs.

For Docker, NVIDIA Docker, Podman, and Enroot jobs, use the $LSB_CONTAINER_IMAGE environment variable to allow users to specify the image name for the container jobs at job submission time. At job submission time, users can specify a specific image name that is in the specified repository server by specifying a value for the $LSB_CONTAINER_IMAGE environment variable.

options
Optional. This keyword specifies the job run options for the container.

To enable a pre-execution script to run, specify an at sign (@) and a full file path to the script, which the execution host must be able to access. Before the container job runs, LSF runs this script with LSF administrator privileges. While the script is running, the jobs' environment variables are passed to the script. When the script finishes running, the output is used as container startup options. The script must provide this output on one line. The method of processing the container job depends on the result of the pre-execution script:

  • If the pre-execution script failed, the container job exits with the same exit code from the script. In addition, an external status message is sent to inform the user that the job exited because of script execution failure.
  • If the execution of the script is successful but the output contains more than 512 options, LSF only keeps the first 512 options, and the remaining options are ignored.
  • If the execution of the script is successful and the output is valid, the output is part of the container job running options. The position of the output from the script in the options is exactly where the user configured the script in the options field.
For Docker and NVIDIA Docker containers, this keyword specifies the Docker job run options for the docker run command, which are passed to the job container.
Note:
  • Before you specify the Docker job run options, make sure that these options work with the docker run command in the command line.
  • The --cgroup-parent and --user (or -u) options are reserved for LSF internal use. Do not use these options in the options keyword configuration, otherwise the job fails.

    If you specified a pre-execution script and the output of this script contains --cgroup-parent, --user, or -u, the container job also fails.

  • The -w and --ulimit options are automatically set for LSF. Do not use these options in the options keyword configuration because these specifications override the LSF settings.
  • The -v option is automatically used by LSF to mount the working directories that LSF requires, such as the current working directory, job spool directory, destination file for the bsub -f command, tmp directory, the LSF_TOP, and the checkpoint directory on demand.
  • You can configure the --rm option in the options keyword configuration to automatically remove containers after the job is done.
  • You can enable LSF to automatically assign a name to a Docker container when it creates the Docker container. To enable this feature, set the ENABLE_CONTAINER_NAME parameter to True in the lsfdockerlib.py file.

    The container name uses the following naming convention:

    • Normal jobs and blaunch parallel job containers: <cluster_name>.job.<job_id>
    • Array jobs and array blaunch parallel job containers: <cluster_name>.job.<job_id>.<job_index>
    • blaunch parallel job task containers: <cluster_name>.job.<job_id>.task.<task_id>
    • Array blaunch parallel job task containers: <cluster_name>.job.<job_id>.<job_index>.task.<task_id>
  • Limitation: If you use the -d option, LSF incorrectly gets the status of the Docker jobs as DONE.
For Shifter containers, this keyword specifies the Shifter job run options for the shifter command, which are passed to the job container.
Note:
  • Run shifter --help in the command line to view the options that the shifter command supports.
  • Before you specify the Shifter job run options, make sure that these options work with the shifter command in the command line.
  • The $LD_LIBRARY_PATH directory is cleaned according to the setuid bit that Shifter uses to work. Therefore, for programs that depend on $LD_LIBRARY_PATH to work (such as openmpi), ensure that the setuid bit can be properly set inside the container by adding LD_LIBRARY_PATH to the siteEnvAppend section of the udiRoot.conf file.
For Singularity containers, this keyword specifies the Singularity job run options for the singularity exec command, which are passed to the job container.
Note:
  • Run singularity exec --help in the command line to view the options that the singularity command supports.
  • Before you specify the Singularity job run options, make sure that these options work with the singularity exec command in the command line.
  • The $LD_LIBRARY_PATH directory is cleaned according to the setuid bit that Singularity uses to work. Therefore, for programs that depend on $LD_LIBRARY_PATH to work (such as openmpi), ensure that the setuid bit can be properly set inside the container by adding LD_LIBRARY_PATH to the ld.so.conf file and run the ldconfig command.
For Podman containers, this keyword specifies the Podman job run options for the podman run command, which are passed to the job container.
Note:
  • Before you specify the Podman job run options, make sure that these options work with the podman run command in the command line.
  • The --user (or -u) option is reserved for LSF internal use. Do not use these options in the options keyword configuration, otherwise the job fails.

    If you specified a pre-execution script and the output of this script contains --user, or -u, the container job also fails.

  • The -w and --ulimit options are automatically set for LSF. Do not use these options in the options keyword configuration because these specifications override the LSF settings.
  • The -v option is automatically used by LSF to mount the working directories that LSF requires, such as the current working directory, job spool directory, destination file for the bsub -f command, tmp directory, the LSF_TOP, and the checkpoint directory on demand.
  • You can configure the --rm option in the options keyword configuration to automatically remove containers after the job is done.
  • Limitation: If you use the -d option, LSF incorrectly gets the status of the Docker jobs as DONE.
For Enroot containers, this keyword specifies the Enroot job run options for the enroot start command, which are passed to the job container.
Note: Before you specify the Enroot job run options, make sure that these options work with the enroot start command in the command line.

Examples

To specify an Ubuntu image for use with container jobs without specifying any optional keywords,
Begin Queue
NAME = dockerq
CONTAINER = docker[image(repository.example.com:5000/file/path/ubuntu:latest)]
DESCRIPTION = Docker User Service
End Queue
Begin Queue
NAME = ndockerq
CONTAINER = nvidia-docker[image(repository.example.com:5000/file/path/ubuntu:latest)]
DESCRIPTION = NVIDIA Docker User Service
End Queue
Begin Queue
NAME = shifterq
CONTAINER = shifter[image(ubuntu:latest)]
DESCRIPTION = Shifter User Service
End Queue
Begin Queue
NAME = singq
CONTAINER = singularity[image(/file/path/ubuntu.img)]
DESCRIPTION = Singularity User Service
End Queue
Begin Queue
NAME = podmanq
CONTAINER = docker[image(repository.example.com:5000/file/path/ubuntu:latest)]
DESCRIPTION = Podman User Service
End Queue
Begin Queue
NAME = enrootq
CONTAINER = enroot[image(repository.example.com:5000/file/path/ubuntu:latest)]
DESCRIPTION = Enroot User Service
End Queue
To specify a pre-execution script in the /share/usr/ directory, which generates the container startup options,
Begin Queue
NAME = dockerqoptions
CONTAINER = docker[image(repository.example.com:5000/file/path/ubuntu:latest) options(@/share/usr/docker-options.sh)]
DESCRIPTION = Docker User Service with pre-execution script for options
End Queue
Begin Queue
NAME = ndockerqoptions
CONTAINER = nvidia-docker[image(repository.example.com:5000/file/path/ubuntu:latest) options(@/share/usr/ndocker-options.sh)]
DESCRIPTION = NVIDIA Docker User Service with pre-execution script for options
End Queue
Begin Queue
NAME = shifterqoptions
CONTAINER = shifter[image(ubuntu:latest) options(@/share/usr/shifter-options.sh)]
DESCRIPTION = Shifter User Service
End Queue
Begin Queue
NAME = singqoptions
CONTAINER = singularity[image(/file/path/ubuntu.img) options(@/share/usr/sing-options.sh)]
DESCRIPTION = Singularity User Service
End Queue
Begin Queue
NAME = podmanqoptions
CONTAINER = docker[image(repository.example.com:5000/file/path/ubuntu:latest) options(@/share/usr/podman-options.sh)]
DESCRIPTION = Podman User Service with pre-execution script for options
End Queue
Begin Queue
NAME = enrootqoptions
CONTAINER = enroot[image(repository.example.com:5000/file/path/ubuntu:latest) options(@/share/usr/enroot-options.sh)]
DESCRIPTION = Enroot User Service with pre-execution script for options
End Queue
  • For sequential jobs, specify the following CONTAINER parameter value for LSF to automatically remove containers after the job is done:

    CONTAINER=docker[image(image-name) options(--rm)]

  • For parallel jobs, the network and IPC must work across containers to make blaunch work. The execution user ID and user name mapping file must be mounted to the container for blaunch authentication.

    Therefore, specify the following CONTAINER parameter value for LSF to configure the container IPC and network parameters so that blaunch can work across multiple containers, to configure the container password file for blaunch authentication, and automatically remove containers after the job is done:

    CONTAINER=docker[image(image-name) options(--rm --network=host --ipc=host -v /path/to/my/passwd:/etc/passwd)]

    The passwd file must be in the standard format for UNIX and Linux password files, such as the following format:

    user1:x:10001:10001:::
    user2:x:10002:10002:::
  • To allow users to specify image names for Docker, NVIDIA Docker, Podman, and Enroot container jobs at job submission time, use the $LSB_CONTAINER_IMAGE environment variable as the image name when specifying the image keyword.
    For example, when defining the CONTAINER parameter for the udockerGPU queue, add the $LSB_CONTAINER_IMAGE environment variable to the image specification:
    Begin Queue
    NAME = udockerGPU
    CONTAINER = docker[image(repository.example.com:5000/$LSB_CONTAINER_IMAGE) \
                options(--rm --net=host --ipc=host  -v --runtime=nvidia /gpfs/u/fred:/data )]
    DESCRIPTION = Docker User Service
    End Queue
    Specify a container image name (such as ubuntu) at the job submission time by setting the $LSB_CONTAINER_IMAGE environment using one of the following methods:
    • Specify the $LSB_CONTAINER_IMAGE environment variable according to your shell environment:
      • In csh or tcsh:

        setenv LSB_CONTAINER_IMAGE ubuntu

      • In sh, ksh, or bash:

        export LSB_CONTAINER_IMAGE=ubuntu

    • Use the bsub -env option:

      bsub -env LSB_CONTAINER_IMAGE=ubuntu -q udocker a.out -in in.dat -out out.dat

    • Use an esub script to set the LSB_CONTAINER_IMAGE environment variable, then call the esub with the bsub command.
      For example, create an esub.docker script in the $LSF_SERVERDIR directory with the following contents:
      #!/bin/sh
      exec 1>&2
      echo "LSB_CONTAINER_IMAGE=∖"$1∖"" >> $LSB_SUB_MODIFY_ENVFILE
      Submit a job to call the esub.docker script by running the following command:
      bsub -a "docker(ubuntu)" -q dockerq a.out -in in.dat -out out.dat

Default

Undefined

CORELIMIT

Specifies the per-process core file size limit for all job processes from this queue.

Syntax

CORELIMIT=integer

Description

Specify this parameter to place a per-process hard core file size limit, in KB, for all of the processes that belong to a job from this queue (see getrlimit(2)).

Default

Unlimited

CPU_FREQUENCY

Specifies the CPU frequency for a queue.

Syntax

CPU_FREQUENCY=[float_number][unit]

Description

All jobs submit to the queue require the specified CPU frequency. Value is a positive float number with units (GHz, MHz, or KHz). If no units are set, the default is GHz.

You can also use bsub -freq to set this value.

The submission value overwrites the application profile value, and the application profile value overwrites the queue value.

Default

Not defined (Nominal CPU frequency is used)

CPULIMIT

Specifies the maximum normalized CPU time and, optionally, the default normalized CPU time that is allowed for all processes of jobs that run in this queue. The name of a host or host model specifies the CPU time normalization host to use.

Syntax

CPULIMIT=[default_limit] maximum_limit

where default_limit and maximum_limit are defined by the following formula:

[hour:]minute[/host_name | /host_model]

Description

Limits the total CPU time the job can use. This parameter is useful for preventing runaway jobs or jobs that use up too many resources.

When the total CPU time for the whole job reaches the limit, a SIGXCPU signal is sent to all processes that belong to the job. If the job has no signal handler for SIGXCPU, the job is killed immediately. If the SIGXCPU signal is handled, blocked, or ignored by the application, then after the grace period expires, LSF sends SIGINT, SIGTERM, and SIGKILL to the job to kill it.

If a job dynamically creates processes, the CPU time that is used by these processes is accumulated over the life of the job.

Processes that exist for fewer than 30 seconds might be ignored.

By default, if a default CPU limit is specified, jobs submitted to the queue without a job-level CPU limit are killed when the default CPU limit is reached.

If you specify only one limit, it is the maximum, or hard, CPU limit. If you specify two limits, the first one is the default, or soft, CPU limit, and the second one is the maximum CPU limit.

If no host or host model is given with the CPU time, LSF uses the default CPU time normalization host that is defined at the queue level (DEFAULT_HOST_SPEC in lsb.queues) if it is configured. Otherwise, the default CPU time normalization host that is defined at the cluster level (DEFAULT_HOST_SPEC in lsb.params) is used if it is configured. Otherwise, the host with the largest CPU factor (the fastest host in the cluster) is used.

Because sbatchd periodically checks whether a CPU time limit was exceeded, a Windows job that runs under a CPU time limit can exceed that limit by up to SBD_SLEEP_TIME.

On UNIX systems, the CPU limit can be enforced by the operating system at the process level.

You can define whether the CPU limit is a per-process limit that is enforced by the OS or a per-job limit enforced by LSF with LSB_JOB_CPULIMIT in lsf.conf.

Jobs that are submitted to a chunk job queue are not chunked if CPULIMIT is greater than 30 minutes.

Default

Unlimited

CPU_TIME_FACTOR

Specifies the CPU time weighting factor. Used only with fairshare scheduling.

Syntax

CPU_TIME_FACTOR=number

Description

In the calculation of a user dynamic share priority, this factor determines the relative importance of the cumulative CPU time that is used by a user’s jobs.

If undefined, the cluster-wide value from the lsb.params parameter of the same name is used.

Default

0.7

CSM_REQ

Specifies the required values for the CSM bsub job submission options. These settings override job level CSM options and append system level allocation flags to the job level allocation flags.

Syntax

CSM_REQ= [jsm=y | n | d] [:step_cgroup=y | n] [: [core_isolation 0 | 1 | 2 | 3 | 4 | 5 | 6] [:cn_mem=mem_value] ] [:alloc_flags "flag1 [flag2 ...] [:smt=smt_value]"]

Description

Use a colon (:) to separate multiple CSM job options. The options can appear in any order and none are required. For the alloc_flags keyword, specify an alphanumeric string of flags and separate multiple flags with a space. The string cannot contain a colon (:).

Example

CSM_REQ=jsm=n:step_cgroup=y:core_isolation 3:cn_mem=1024

Default

Not defined.

DATALIMIT

Specifies the per-process data segment size limit for all job processes from this queue.

Syntax

DATALIMIT=[default_limit] maximum_limit

Description

Set this parameter to place a per-process data segment size limit, in KB, for all of the processes that belong to a job from this queue (see getrlimit(2)).

By default, if a default data limit is specified, jobs submitted to the queue without a job-level data limit are killed when the default data limit is reached.

If you specify only one limit, it is the maximum, or hard, data limit. If you specify two limits, the first one is the default, or soft, data limit, and the second one is the maximum data limit.

Default

Unlimited

DATA_TRANSFER

Configures the queue as a data transfer queue for LSF data manager.

Syntax

DATA_TRANSFER=Y | N

Description

The DATA_TRANSFER=Y parameter enables a queue for data transfer through LSF data manager.

Only one queue in a cluster can be a data transfer queue. Any transfer jobs that are submitted by the data manager go to this queue. If the lsf.datamanager file exists, then at least one queue must define the DATA_TRANSFER parameter. If this parameter is set, a corresponding lsf.datamanager file must exist.

Regular jobs that are submitted to this queue through bsub are rejected.

Use the bstop, bresume, and bkill commands to stop, resume, and kill your own transfer jobs in a data transfer queue. LSF administrators and queue administrators can additionally use the btop and bbot commands to move transfer jobs in the queue. All other commands on jobs in a data transfer queue are rejected. You cannot use the bswitch command to switch jobs from other queues to a data transfer queue.

If you change this parameter, LSF data manager transfer jobs that were in the previous queue remain in that queue and are scheduled and run as normal. The LSF data manager is notified of their success or failure.

The following queue parameters cannot be used together with a queue that defines the DATA_TRANSFER parameter:
  • INTERACTIVE=ONLY
  • RCVJOBS_FROM
  • MAX_RSCHED_TIME
  • SUCCESS_EXIT_VALUES
  • RERUNNABLE

A data transfer queue cannot appear in the list of default queues that are defined by the DEFAULT_QUEUE parameter in the lsb.params file. Jobs that are submitted to the data transfer queue are not attached to the application specified by the DEFAULT_APPLICATION parameter in the lsb.params file.

Default

N

DEFAULT_EXTSCHED

Specifies default external scheduling options for the queue.

Syntax

DEFAULT_EXTSCHED=external_scheduler_options

Description

-extsched options on the bsub command are merged with DEFAULT_EXTSCHED options, and -extsched options override any conflicting queue-level options set by DEFAULT_EXTSCHED.

Default

Not defined

DEFAULT_HOST_SPEC

Specifies the default CPU time normalization host for the queue.

Syntax

DEFAULT_HOST_SPEC=host_name | host_model

Description

The CPU factor of the specified host or host model is used to normalize the CPU time limit of all jobs in the queue, unless the CPU time normalization host is specified at the job level.

Default

Not defined. The queue uses the DEFAULT_HOST_SPEC defined in lsb.params. If DEFAULT_HOST_SPEC is not defined in either file, LSF uses the fastest host in the cluster.

DESCRIPTION

Specifies a description of the job queue that is displayed by bqueues -l.

Syntax

DESCRIPTION=text

Description

Use a description that clearly describes the service features of this queue to help users select the proper queue for each job.

The text can include any characters, including white space. The text can be extended to multiple lines by ending the preceding line with a backslash (\). The maximum length for the text is 512 characters.

DISPATCH_BY_QUEUE (Obsolete)

This parameter is obsolete in LSF Version 10.1 Fix Pack 10 and is replaced by the JOB_DISPATCH_PACK_SIZE parameter in the lsb.params file.

Syntax

DISPATCH_BY_QUEUE=Y|y|N|n

Description

Enables the scheduling decision for the specified queue to be published without waiting for the whole scheduling session to finish.

Set this parameter to increase queue responsiveness. The scheduling decision for the jobs in the specified queue is final and these jobs cannot be preempted within the same scheduling cycle.

Tip: Set this parameter only for your highest priority queue (such as for an interactive queue) to ensure that this queue has the highest responsiveness.

Default

N

DISPATCH_ORDER

Defines an ordered cross-queue fairshare set, which indicates that jobs are dispatched according to the order of queue priorities first, then user fairshare priority.

Syntax

DISPATCH_ORDER=QUEUE

Description

By default, a user has the same priority across the parent and child queues. If the same user submits several jobs to these queues, user priority is calculated by taking into account all the jobs the user submits across the parent-child set.

If DISPATCH_ORDER=QUEUE is set in the parent queue, jobs are dispatched according to queue priorities first, then user priority. Jobs from users with lower fairshare priorities who have pending jobs in higher priority queues are dispatched before jobs in lower priority queues. This behavior avoids having users with higher fairshare priority from getting jobs that are dispatched from low-priority queues.

Jobs in queues with the same priority are dispatched according to user priority.

Queues that are not part of the cross-queue fairshare can have any priority; they are not limited to fall outside of the priority range of cross-queue fairshare queues.

Default

Not defined

DISPATCH_WINDOW

Specifies the time windows in which jobs from this queue are dispatched. After jobs are dispatched, they are no longer affected by the dispatch window.

Syntax

DISPATCH_WINDOW=time_window ...

Description

Jobs from this queue are not dispatched outside of the dispatch window.

Default

Not defined. Dispatch window is always open.

DOCKER_IMAGE_AFFINITY

Syntax

DOCKER_IMAGE_AFFINITY=Y | y | N | n

Description

When scheduling Docker-based containerized jobs, setting this parameter to y or Y enables LSF to give preference for execution hosts that already have the requested Docker image. This reduces network bandwidth and the job start time because the execution host does not have to pull the Docker image from the repository and the job can immediately start on the execution host.

When this feature is enabled, LSF considers Docker image location information when scheduling Docker jobs. Docker image affinity interacts with host preference and order[] string requests in the following manner:

  • If host preference is specified, the host preference is honored first. Among hosts with the same preference level, hosts with the requested Docker image are given higher priority.
  • If the order[] string is specified, the hosts with the requested Docker image have a higher priority first. Among hosts that all have the requested Docker image, the order[] string is then honored.

The CONTAINER parameter must be defined for this parameter to work with this queue.

Default

Not defined.

ELIGIBLE_PEND_TIME_LIMIT

Specifies the eligible pending time limit for a job.

Syntax

ELIGIBLE_PEND_TIME_LIMIT=[hour:]minute

Description

LSF sends the queue-level eligible pending time limit configuration to IBM Spectrum LSF RTM (LSF RTM), which handles the alarm and triggered actions such as user notification (for example, notifying the user that submitted the job and the LSF administrator) and job control actions (for example, killing the job). LSF RTM compares the job's eligible pending time to the eligible pending time limit, and if the job is in an eligible pending state for longer than this specified time limit, LSF RTM triggers the alarm and actions. This parameter works without LSF RTM, but LSF does not take any other alarm actions.

In MultiCluster job forwarding mode, the job's eligible pending time limit is ignored in the execution cluster, while the submission cluster merges the job's queue-, application-, and job-level eligible pending time limit according to local settings.

The eligible pending time limit is in the form of [hour:]minute. The minutes can be specified as a number greater than 59. For example, three and a half hours can either be specified as 3:30, or 210.

The job-level eligible pending time limit (bsub -eptl) overrides the application-level limit (ELIGIBLE_PEND_TIME_LIMIT in lsb.applications), and the application-level limit overrides the queue-level limit specified here.

Default

Not defined.

ENABLE_GPU_HIST_RUN_TIME

Enables the use of historical GPU run time in the calculation of fairshare scheduling priority. Used only with fairshare scheduling.

Syntax

ENABLE_GPU_HIST_RUN_TIME=y | Y | n | N

Description

If set to Y, enables the use of historical GPU run time in the calculation of fairshare scheduling priority.

If undefined, the cluster-wide value from the lsb.params parameter of the same name is used.

Default

Not defined.

ENABLE_HIST_RUN_TIME

Enables the use of historical run time in the calculation of fairshare scheduling priority. Used only with fairshare scheduling.

Syntax

ENABLE_HIST_RUN_TIME=y | Y | n | N

Description

If set to Y, enables the use of historical run time in the calculation of fairshare scheduling priority.

If undefined, the cluster-wide value from the lsb.params parameter of the same name is used.

Default

Not defined.

ESTIMATED_RUNTIME

Syntax

ESTIMATED_RUNTIME=[hour:]minute[/host_name | /host_model]

Description

This parameter specifies an estimated run time for jobs associated with a queue. LSF uses the ESTIMATED_RUNTIME value for scheduling purposes only, and does not kill jobs that exceed this value unless the jobs also exceed a defined RUNLIMIT. The format of runtime estimate is same as the RUNLIMIT parameter.

The job-level runtime estimate specified by bsub -We or the ESTIMATED_RUNTIME setting in an application override the ESTIMATED_RUNTIME setting in the queue. The ESTIMATED_RUNTIME setting in the queue overrides the cluster-wide ESTIMATED_RUNTIME setting.

The following LSF features use the ESTIMATED_RUNTIME value to schedule jobs:
  • Job chunking
  • Advance reservation
  • SLA
  • Slot reservation
  • Backfill
  • Allocation planner

Default

Not defined

EXCLUSIVE

Specifies an exclusive queue and compute unit type.

Syntax

EXCLUSIVE=Y | N | CU[cu_type]

Description

If set to Y, specifies an exclusive queue.

If set to CU, CU[], or CU[cu_type], specifies an exclusive queue as well as a queue exclusive to compute units of type cu_type (as defined in lsb.params). If no type is specified, the default compute unit type is used.

Jobs that are submitted to an exclusive queue with bsub -x are only dispatched to a host that has no other running jobs. Jobs that are submitted to a compute unit exclusive queue with bsub -R "cu[excl]" only run on a compute unit that has no other running jobs.

For hosts shared under the MultiCluster resource leasing model, jobs are not dispatched to a host that has running jobs, even if the jobs are from another cluster.

Note: EXCLUSIVE=Y or EXCLUSIVE=CU[cu_type] must be configured to enable affinity jobs to use CPUs exclusively, when the alljobs scope is specified in the exclusive option of an affinity[] resource requirement string.

Default

N

EXEC_DRIVER

Syntax

Docker and Podman jobs: EXEC_DRIVER=context[user(user_name)] starter[/file_path_serverdir/docker-starter.py] controller[/file_path/to/serverdir/docker-control.py] monitor[/file_path/to/serverdir/docker-monitor.py]

Enroot jobs: EXEC_DRIVER=starter[/file_path_serverdir/enroot-starter.py]

Replace file_path/to/serverdir with the actual file path of the LSF_SERVERDIR directory.

Description

Optional for Enroot jobs. Specifies the execution driver framework for Docker or Podman container jobs in this queue. This parameter uses the following keyword:

user
Optional for Docker jobs and ignored for Enroot jobs. This keyword specifies the user account for starting scripts. The configured value is a user name instead of a user ID. For Docker jobs, this user must be a member of the Docker user group. For Podman jobs, the user name must be set to "default".

By default, this is the LSF primary administrator.

Note: This cannot be the root user.

LSF includes three execution driver scripts that are used to start a job (docker-starter.py), monitor the resource of a job (docker-monitor.py), and send a signal to a job (docker-control.py). These scripts are located in the LSF_SERVERDIR directory. Change the owner of the script files to the context user and change the file permissions to 700 or 500 before using them in the EXEC_DRIVER parameter.

The starter script is required. For Docker container jobs, the monitor and control scripts are required if the cgroupfs driver is systemd, but are optional if the cgroupfs driver is cgroupfs. For Podman container jobs, the monitor script is optional while the control script is required. For Enroot container jobs, the starter script is required while all other scripts are ignored.

Interaction with the CONTAINER parameter for Docker, Podman, or Enroot jobs

For Docker , Podman, or Enrootjobs, the EXEC_DRIVER parameter interacts with the following keywords in the CONTAINER parameter:

  • image, which specifies the image name ($LSB_CONTAINER_IMAGE environment variable) is supported when specifying the script names.
  • options with runtime options and the option script is supported.

Example

Begin Queue
NAME = dockerq
CONTAINER = docker[image(repository.example.com:5000/file/path/ubuntu:latest)
            options(--rm --network=host --ipc=host -v /path/to/my/passwd:/etc/passwd)]
EXEC_DRIVER = context[user(user-name)] starter[/path/to/driver/docker-starter.py]
              controller[/path/to/driver/docker-control.py]
              monitor[/path/to/driver/docker-monitor.py]
DESCRIPTION = Docker User Service
End Queue
Begin Queue
NAME = podmanq
CONTAINER = docker[image(repository.example.com:5000/file/path/ubuntu:latest)
            options(--rm --network=host --ipc=host -v /path/to/my/passwd:/etc/passwd)]
EXEC_DRIVER = context[user(default)] starter[/path/to/driver/docker-starter.py]
              controller[/path/to/driver/docker-control.py]
              monitor[/path/to/driver/docker-monitor.py]
DESCRIPTION = Podman User Service
End Queue
Begin Queue
NAME = enrootq
CONTAINER = enroot[image(repository.example.com:5000/file/path/ubuntu:latest)
            options(--mount /mydir:/mydir2]
EXEC_DRIVER = starter[/path/to/driver/enroot-starter.py]
DESCRIPTION = Enroot User Service
End Queue

Default

Undefined for Docker and Podman jobs.

starter[$LSF_SERVERDIR/enroot-starter.py] for Enroot jobs

EXTENDABLE_RUNLIMIT

Enables the LSF allocation planner to extend the run limits of a job by changing its soft run limit if the resources that are occupied by this job are not needed by other jobs in queues with the same or higher priority. A soft run limit can be extended, while a hard run limit cannot be extended. The allocation planner looks at job plans of other jobs to determine if there are other jobs that require this job's resources.

Syntax

EXTENDABLE_RUNLIMIT=BASE[minutes] INCREMENT[minutes] GRACE[minutes] REQUEUE[Y|N]

Description

If configured, this parameter applies an extendable run limit policy to jobs with plans in the queue. When a job with a plan is dispatched, LSF sets an initial soft run limit for the job. Whenever a job reaches the soft run limit, LSF considers whether another job has a planned allocation on the resources. If not, LSF extends the job's soft run limit. Otherwise, LSF sets a hard run limit for the job. Whenever a job reaches the hard run limit, LSF terminates or requeues the job.

This parameter uses the following keywords:

BASE[minutes]
The initial soft run limit that is imposed on jobs in the queue. Whenever the job reaches the soft run limit, the allocation planner considers whether the resources that are held by the job are needed by another job in the queue by looking at plans for other jobs. If the resources are not required, LSF extends the soft run limit for the job. Otherwise, LSF sets a hard run limit.

Specify an integer value for the initial soft run limit.

INCREMENT[minutes]
Optional. If LSF decides to extend the soft run limit for the job, this keyword specifies the amount of time that LSF extends the soft run limit.

Specify an integer value for the soft run limit extension time. The default value is the value of the BASE[] keyword.

GRACE[minutes]
Optional. If LSF decides not to extend the soft run limit for the job, a hard run limit is set for this amount of minutes from the time the decision is made.

The default value is 0 (the job is terminated or requeued immediately).

REQUEUE[Y | N]
Optional. Specifies the action that LSF takes when a job reaches its hard run limit. If set to N, LSF terminates the job. If set to Y LSF requeues the job.

The default value is N (LSF terminates the job once the job reaches its hard run limit).

Default

Not defined.

Jobs that reach the specified run limit time (as specified by the RUNLIMIT parameter or the -W option) are checkpointed (if checkpointable), then terminated, regardless of whether resources are available.

FAIRSHARE

Enables queue-level user-based fairshare and specifies share assignments.

Syntax

FAIRSHARE=USER_SHARES[[user, number_shares] ...]
  • Specify at least one user share assignment.
  • Enclose the list in square brackets, as shown.
  • Enclose each user share assignment in square brackets, as shown.
  • user: specify users who are also configured to use queue. You can assign the shares to the following types of users:
    • A single user (specify user_name). To specify a Windows user account, include the domain name in uppercase letters (DOMAIN_NAME\user_name).
    • Users in a group, individually (specify group_name@) or collectively (specify group_name). To specify a Windows user group, include the domain name in uppercase letters (DOMAIN_NAME\group_name.
    • Users not included in any other share assignment, individually (specify the keyword default) or collectively (specify the keyword others)
      • By default, when resources are assigned collectively to a group, the group members compete for the resources on a first-come, first-served (FCFS) basis. You can use hierarchical fairshare to further divide the shares among the group members.
      • When resources are assigned to members of a group individually, the share assignment is recursive. Members of the group and of all subgroups always compete for the resources according to FCFS scheduling, regardless of hierarchical fairshare policies.
  • number_shares
    • Specify a positive integer that represents the number of shares of the cluster resources that are assigned to the user.
    • The number of shares that are assigned to each user is only meaningful when you compare it to the shares assigned to other users or to the total number of shares. The total number of shares is just the sum of all the shares that are assigned in each share assignment.

Description

Enables queue-level user-based fairshare and specifies optional share assignments. If share assignments are specified, only users with share assignments can submit jobs to the queue.

Compatibility

Do not configure hosts in a cluster to use fairshare at both queue and host levels. However, you can configure user-based fairshare and queue-based fairshare together.

Default

Not defined. No fairshare.

FAIRSHARE_ADJUSTMENT_FACTOR

Specifies the fairshare adjustment plug-in weighting factor. Used only with fairshare scheduling.

Syntax

FAIRSHARE_ADJUSTMENT_FACTOR=number

Description

In the calculation of a user dynamic share priority, this factor determines the relative importance of the user-defined adjustment that is made in the fairshare plug-in (libfairshareadjust.*).

A positive float number both enables the fairshare plug-in and acts as a weighting factor.

If undefined, the cluster-wide value from the lsb.params parameter of the same name is used.

Default

Not defined.

FAIRSHARE_QUEUES

Defines cross-queue fairshare.

Syntax

FAIRSHARE_QUEUES=queue_name[queue_name ...]

Description

When this parameter is defined:
  • The queue in which this parameter is defined becomes the “management”.
  • Queues that are listed with this parameter are child queues and inherit the fairshare policy of the parent queue.
  • A user has the same priority across the parent and child queues. If the same user submits several jobs to these queues, user priority is calculated by taking into account all the jobs that the user submitted across the parent-child set.

Notes

  • By default, the PRIORITY range that is defined for queues in cross-queue fairshare cannot be used with any other queues. For example, you have 4 queues: queue1, queue2, queue3, queue4. You assign the following cross-queue fairshare: priorities
    • queue1 priority: 30
    • queue2 priority: 40
    • queue3 priority: 50
  • By default, the priority of queue4 (which is not part of the cross-queue fairshare) cannot fall between the priority range of the cross-queue fairshare queues (30-50). It can be any number up to 29 or higher than 50. It does not matter if queue4 is a fairshare queue or FCFS queue. If DISPATCH_ORDER=QUEUE is set in the parent queue, the priority of queue4 (which is not part of the cross-queue fairshare) can be any number, including a priority that falls between the priority range of the cross-queue fairshare queues (30-50).
  • FAIRSHARE must be defined in the parent queue. If it is also defined in the queues that are listed in FAIRSHARE_QUEUES, it is ignored.
  • Cross-queue fairshare can be defined more than once within lsb.queues. You can define several sets of parent-child queues. However, a queue cannot belong to more than one parent-child set. For example, you can define:
    • In queue normal: FAIRSHARE_QUEUES=short
    • In queue priority: FAIRSHARE_QUEUES=night owners
      Restriction: You cannot, however, define night, owners, or priority as children in the queue normal; or normaland short as children in the priority queue; or short, night, owners as parent queues of their own.
  • Cross-queue fairshare cannot be used with host partition fairshare. It is part of queue-level fairshare.
  • Cross-queue fairshare cannot be used with absolute priority scheduling.

Default

Not defined

FILELIMIT

Specifies the per-process file size limit for all job processes from this queue.

Syntax

FILELIMIT=integer

Description

Set this parameter to place a per-process hard file size limit, in KB, for all of the processes that belong to a job from this queue (see getrlimit(2)).

Default

Unlimited

FWD_JOB_FACTOR

Forwarded job slots weighting factor. Used only with fairshare scheduling.

Syntax

FWD_JOB_FACTOR=number

Description

In the calculation of a user's dynamic share priority, this factor determines the relative importance of the number of forwarded job slots reserved and in use by a user when using the LSF multicluster capability.

If undefined, the cluster-wide value from the lsb.params parameter of the same name is used.

Default

Not defined.

See also

RUN_JOB_FACTOR

FWD_USERS

Specifies a space-separated list of user names or user groups that can forward jobs to remote clusters in this queue when using the LSF multicluster capability.

Syntax

FWD_USERS=all [~user_name ...] [~user_group ...] | [user_name ...] [user_group [~user_group ...] ...]

Description

If user groups are specified, each user in the group can forward jobs to remote clusters in this queue. Use local user groups when specifying user groups.

User names must be valid login names. To specify a Windows user account, include the domain name in uppercase letters (DOMAIN_NAME\user_name).

User group names can be LSF user groups or UNIX and Windows user groups. To specify a Windows user group, include the domain name in uppercase letters (DOMAIN_NAME\user_group).

Use the keyword all to specify all users or user groups in a cluster.

Use the not operator (~) to exclude users from the all specification or from user groups. This is useful if you have a large number of users but only want to exclude a few users or groups from the queue definition.

The not operator (~) can only be used with the all keyword or to exclude users from user groups.

Default

all (all users can forward jobs to remote clusters in the queue)

Examples

  • FWD_USERS=user1 user2
  • FWD_USERS=all ~user1 ~user2
  • FWD_USERS=all ~ugroup1
  • FWD_USERS=groupA ~user3 ~user4

GPU_REQ

Specify GPU requirements together in one statement.

Syntax

GPU_REQ = "[num=num_gpus[/task | host]] [:mode=shared | exclusive_process] [:mps=yes[,shared][,nocvd] | no | per_socket[,shared][,nocvd] | per_gpu[,shared][,nocvd]] [:j_exclusive=yes | no] [:gvendor=amd | nvidia] [:gmodel=model_name[#mem_size]] [:gtile=tile_num|'!'] [:gmem=mem_value] [:glink=yes][:aff=yes | no][:block=yes | no] [:gpack=yes | no]"

Description

The GPU_REQ parameter takes the following arguments:
num=num_gpus[/task | host]
The number of physical GPUs required by the job. By default, the number is per host. You can also specify that the number is per task by specifying /task after the number.

If you specified that the number is per task, the configuration of the ngpus_physical resource in the lsb.resources file is set to PER_TASK, or the RESOURCE_RESERVE_PER_TASK=Y parameter is set in the lsb.params file, this number is the requested count per task.

mode=shared | exclusive_process
The GPU mode when the job is running, either shared or exclusive_process. The default mode is shared.

The shared mode corresponds to the Nvidia or AMD DEFAULT compute mode. The exclusive_process mode corresponds to the Nvidia EXCLUSIVE_PROCESS compute mode.

Note: Do not specify exclusive_process when you are using AMD GPUs (that is, when gvendor=amd is specified).
mps=yes[,nocvd][,shared] | per_socket[,shared][,nocvd] | per_gpu[,shared][,nocvd] | no
Enables or disables the Nvidia Multi-Process Service (MPS) for the GPUs that are allocated to the job. Using MPS effectively causes the EXCLUSIVE_PROCESS mode to behave like the DEFAULT mode for all MPS clients. MPS always allows multiple clients to use the GPU through the MPS server.
Note: To avoid inconsistent behavior, do not enable mps when you are using AMD GPUs (that is, when gvendor=amd is specified). If the result of merging the GPU requirements at the cluster, queue, application, and job levels is gvendor=amd and mps is enabled (for example, if gvendor=amd is specified at the job level without specifying mps=no, but mps=yes is specified at the application, queue, or cluster level), LSF ignores the mps requirement.

MPS is useful for both shared and exclusive process GPUs, and allows more efficient sharing of GPU resources and better GPU utilization. See the Nvidia documentation for more information and limitations.

When using MPS, use the EXCLUSIVE_PROCESS mode to ensure that only a single MPS server is using the GPU, which provides additional insurance that the MPS server is the single point of arbitration between all CUDA process for that GPU.

You can also enable MPS daemon sharing by adding the share keyword with a comma and no space (for example, mps=yes,shared enables MPS daemon sharing on the host). If sharing is enabled, all jobs that are submitted by the same user with the same resource requirements share the same MPS daemon on the host, socket, or GPU.

LSF starts MPS daemons on a per-host, per-socket, or per-GPU basis depending on value of the mps keyword:

  • If mps=yes is set, LSF starts one MPS daemon per host per job.

    When share is enabled (that is, if mps=yes,shared is set), LSF starts one MPS daemon per host for all jobs that are submitted by the same user with the same resource requirements. These jobs all use the same MPS daemon on the host.

    When the CUDA_VISIBLE_DEVICES environment variable is disabled (that is, if mps=yes,nocvd is set), LSF does not set the CUDA_VISIBLE_DEVICES<number> environment variables for tasks, so LSF MPI does not set CUDA_VISIBLE_DEVICES for the tasks. LSF just sets the CUDA_VISIBLE_DEVICES<number> environment variables for tasks, not CUDA_VISIBLE_DEVICES. LSF MPI converts the CUDA_VISIBLE_DEVICES<number> environment variables into CUDA_VISIBLE_DEVICES and sets that for the tasks.

  • If mps=per_socket is set, LSF starts one MPS daemon per socket per job. When enabled with share (that is, if mps=per_socket,shared is set), LSF starts one MPS daemon per socket for all jobs that are submitted by the same user with the same resource requirements. These jobs all use the same MPS daemon for the socket.
  • If mps=per_gpu is set, LSF starts one MPS daemon per GPU per job. When enabled with share (that is, if mps=per_gpu,shared is set), LSF starts one MPS daemon per GPU for all jobs that are submitted by the same user with the same resource requirements. These jobs all use the same MPS daemon for the GPU.
Important: Using EXCLUSIVE_THREAD mode with MPS is not supported and might cause unexpected behavior.
j_exclusive=yes | no
Specifies whether the allocated GPUs can be used by other jobs. When the mode is set to exclusive_process, the j_exclusive=yes option is set automatically.
aff=yes | no
Specifies whether to enforce strict GPU-CPU affinity binding. If set to no, LSF relaxes GPU affinity while maintaining CPU affinity. By default, aff=yes is set to maintain strict GPU-CPU affinity binding.
Note: The aff=yes setting conflicts with block=yes (distribute allocated GPUs as blocks when the number of tasks is greater than the requested number of GPUs). This is because strict CPU-GPU binding allocates GPUs to tasks based on the CPU NUMA ID, which conflicts with the distribution of allocated GPUs as blocks. If aff=yes and block=yes are both specified in the GPU requirements string, the block=yes setting takes precedence and strict CPU-GPU affinity binding is disabled (that is, aff=no is automatically set).
block=yes | no
Specifies whether to enable block distribution, that is, to distribute the allocated GPUs of a job as blocks when the number of tasks is greater than the requested number of GPUs. If set to yes, LSF distributes all the allocated GPUs of a job as blocks when the number of tasks is bigger than the requested number of GPUs. By default, block=no is set so that allocated GPUs are not distributed as blocks.

For example, if a GPU job requests to run on a host with 4 GPUs and 40 tasks, block distribution assigns GPU0 for ranks 0-9, GPU1 for ranks 10-19, GPU2 for tanks 20-29, and GPU3 for ranks 30-39.

Note: The block=yes setting conflicts with aff=yes (strict CPU-GPU affinity binding). This is because strict CPU-GPU binding allocates GPUs to tasks based on the CPU NUMA ID, which conflicts with the distribution of allocated GPUs as blocks. If block=yes and aff=yes are both specified in the GPU requirements string, the block=yes setting takes precedence and strict CPU-GPU affinity binding is disabled (that is, aff=no is automatically set).
gpack=yes | no
For shared mode jobs only. Specifies whether to enable pack scheduling. If set to yes, LSF packs multiple shared mode GPU jobs to allocated GPUs. LSF schedules shared mode GPUs as follows:
  1. LSF sorts the candidate hosts (from largest to smallest) based on the number of shared GPUs that already have running jobs, then by the number of GPUs that are not exclusive.

    If the order[] keyword is defined in the resource requirements string, after sorting order[], LSF re-sorts the candidate hosts by the gpack policy (by shared GPUs that already have running jobs first, then by the number of GPUs that are not exclusive). The gpack policy sort priority is higher than the order[] sort.

  2. LSF sorts the candidate GPUs on each host (from largest to smallest) based on the number of running jobs.

After scheduling, the shared mode GPU job packs to the allocated shared GPU that is sorted first, not to a new shared GPU.

If Docker attribute affinity is enabled, the order of candidate hosts are sorted by Docker attribute affinity before sorting by GPUs.

By default, gpack=no is set so that pack scheduling is disabled.

gvendor=amd | nvidia
Specifies the GPU vendor type. LSF allocates GPUs with the specified vendor type.

Specify amd to request AMD GPUs, or specify nvidia to request Nvidia GPUs.

By default, LSF requests Nvidia GPUs.

gmodel=model_name[-mem_size]
Specifies GPUs with the specific model name and, optionally, its total GPU memory. By default, LSF allocates the GPUs with the same model, if available.

The gmodel keyword supports the following formats:

gmodel=model_name
Requests GPUs with the specified brand and model name (for example, TeslaK80).
gmodel=short_model_name
Requests GPUs with a specific brand name (for example, Tesla, Quadro, NVS, ) or model type name (for example, K80, P100).
gmodel=model_name-mem_size
Requests GPUs with the specified brand name and total GPU memory size. The GPU memory size consists of the number and its unit, which includes M, G, T, MB, GB, and TB (for example, 12G).

To find the available GPU model names on each host, run the lsload –gpuload, lshosts –gpu, or bhosts -gpu commands. The model name string does not contain space characters. In addition, the slash (/) and hyphen (-) characters are replaced with the underscore character (_). For example, the GPU model name “Tesla C2050 / C2070” is converted to “TeslaC2050_C2070” in LSF.

gmem=mem_value

Specify the GPU memory on each GPU required by the job. The format of mem_value is the same to other resource value (for example, mem or swap) in the rusage section of the job resource requirements (-R).

gtile=! | tile_num
Specifies the number of GPUs per socket. Specify an number to explicitly define the number of GPUs per socket on the host, or specify an exclamation mark (!) to enable LSF to automatically calculate the number, which evenly divides the GPUs along all sockets on the host. LSF guarantees the gtile requirements even for affinity jobs. This means that LSF might not allocate the GPU's affinity to the allocated CPUs when the gtile requirements cannot be satisfied.

If the gtile keyword is not specified for an affinity job, LSF attempts to allocate enough GPUs on the sockets that allocated GPUs. If there are not enough GPUs on the optimal sockets, jobs cannot go to this host.

If the gtile keyword is not specified for a non-affinity job, LSF attempts to allocate enough GPUs on the same socket. If this is not available, LSF might allocate GPUs on separate GPUs.

nvlink=yes
Obsolete in LSF, Version 10.1 Fix Pack 11. Use the glink keyword instead. Enables the job enforcement for NVLink connections among GPUs. LSF allocates GPUs with NVLink connections in force.
glink=yes
Enables job enforcement for special connections among GPUs. LSF must allocate GPUs with the special connections that are specific to the GPU vendor.

If the job requests AMD GPUs, LSF must allocate GPUs with the xGMI connection. If the job requests Nvidia GPUs, LSF must allocate GPUs with the NVLink connection.

Do not use glink together with the obsolete nvlink keyword.

By default, LSF can allocate GPUs without special connections when there are not enough GPUs with these connections.

mig=GI_size[/CI_size]
Specifies Nvidia Multi-Instance GPU (MIG) device requirements.

Specify the requested number of GPU instances for the MIG job. Valid GPU instance sizes are 1, 2, 3, 4, 7.

Optionally, specify the requested number of compute instances after the specified GPU instance size and a slash character (/). The requested compute instance size must be less than or equal to the requested GPU instance size. In addition, Nvidia MIG does not support the following GPU/compute instance size combinations: 4/3, 7/5, 7/6. If this is not specified, the default compute instance size is 1.

The syntax of the GPU requirement in the -gpu option is the same as the syntax in the LSB_GPU_REQ parameter in the lsf.conf file and the GPU_REQ parameter in the lsb.queues and lsb.applications files.
Note: The bjobs output does not show aff=yes even if you specify aff=yes in the bsub -gpu option.

If the GPU_REQ_MERGE parameter is defined as Y or y in the lsb.params file and a GPU requirement is specified at multiple levels (at least two of the default cluster, queue, application profile, or job level requirements), each option of the GPU requirement is merged separately. Job level overrides application level, which overrides queue level, which overrides the default cluster GPU requirement. For example, if the mode option of the GPU requirement is defined on the -gpu option, and the mps option is defined in the queue, the mode of job level and the mps value of queue is used.

If the GPU_REQ_MERGE parameter is not defined as Y or y in the lsb.params file and a GPU requirement is specified at multiple levels (at least two of the default cluster, queue, application profile, or job level requirements), the entire GPU requirement string is replaced. The entire job level GPU requirement string overrides application level, which overrides queue level, which overrides the default GPU requirement.

The esub parameter LSB_SUB4_GPU_REQ modifies the value of the -gpu option.

LSF selects the GPU that meets the topology requirement first. If the GPU mode of the selected GPU is not the requested mode, LSF changes the GPU to the requested mode. For example, if LSF allocates an exclusive_process GPU to a job that needs a shared GPU, LSF changes the GPU mode to shared before the job starts and then changes the mode back to exclusive_process when the job finishes.

The GPU requirements are converted to rusage resource requirements for the job. For example, num=2 is converted to rusage[ngpus_physical=2]. Use the bjobs, bhist, and bacct commands to see the merged resource requirement.

There might be complex GPU requirements that the bsub -gpu option and GPU_REQ parameter syntax cannot cover, including compound GPU requirements (for different GPU requirements for jobs on different hosts, or for different parts of a parallel job) and alternate GPU requirements (if more than one set of GPU requirements might be acceptable for a job to run). For complex GPU requirements, use the bsub -R command option, or the RES_REQ parameter in the lsb.applications or lsb.queues file to define the resource requirement string.

Important: You can define the mode, j_exclusive, and mps options only with the -gpu option, the LSB_GPU_REQ parameter in the lsf.conf file, or the GPU_REQ parameter in the lsb.queues or lsb.applications files. You cannot use these options with the rusage resource requirement string in the bsub -R command option or the RES_REQ parameter in the lsb.queues or lsb.applications files.

Default

Not defined

See also

  • LSB_GPU_REQ
  • bsub -gpu

GPU_RUN_TIME_FACTOR

Specifies the GPU run time weighting factor. Used only with fairshare scheduling.

Syntax

GPU_RUN_TIME_FACTOR=number

Description

In the calculation of a user’s dynamic share priority, this factor determines the relative importance of the total GPU run time of a user's running GPU jobs.

If undefined, the cluster-wide value from the lsb.params parameter of the same name is used.

Default

Not defined.

HIST_HOURS

Determines a rate of decay for cumulative CPU time, run time, and historical run time. Used only with fairshare scheduling.

Syntax

HIST_HOURS=hours

Description

To calculate dynamic user priority, LSF uses a decay factor to scale the actual CPU time and run time. One hour of recently used time is equivalent to 0.1 hours after the specified number of hours elapses.

To calculate dynamic user priority with decayed run time and historical run time, LSF uses the same decay factor to scale the accumulated run time of finished jobs and run time of running jobs, so that one hour of recently used time is equivalent to 0.1 hours after the specified number of hours elapses.

When HIST_HOURS=0, CPU time and run time that is accumulated by running jobs is not decayed.

If undefined, the cluster-wide value from the lsb.params parameter of the same name is used.

Default

Not defined.

HJOB_LIMIT

Specifies the per-host job slot limit.

Syntax

HJOB_LIMIT=integer

Description

This parameter defines the maximum number of job slots that this queue can use on any host. This limit is configured per host, regardless of the number of processors it might have.

Note: When LSB_ENABLE_HPC_ALLOCATION=Y is defined, all slots on the execution hosts must be allocated to exclusive jobs. If the HJOB_LIMIT value for the queue is less than the maximum number of slots on the execution host, jobs submitted to this queue remaining pending indefinitely. This is because LSF cannot allocate all slots in this host for your exclusive job.

Example

The following queue runs a maximum of one job on each of hostA, hostB, and hostC:
Begin Queue 
... 

HJOB_LIMIT = 1 

HOSTS=hostA hostB hostC 
... 

End Queue

Default

Unlimited

HOST_POST_EXEC

Enables host-based post-execution processing at the queue level.

Syntax

HOST_POST_EXEC=command

Description

The HOST_POST_EXEC command runs on all execution hosts after the job finishes. If job-based post-execution POST_EXEC was defined at the queue-level, application-level, or job-level, the HOST_POST_EXEC command runs after POST_EXEC of any level.

Host-based post-execution commands can be configured at the queue and application level, and run in the following order:
  1. The application-level command
  2. The queue-level command.

The supported command rule is the same as the existing POST_EXEC for the queue section. See the POST_EXEC topic for details.

Note: The host-based post-execution command cannot run on Windows systems. This parameter cannot be used to configure job-based post-execution processing.

Default

Not defined.

HOST_PRE_EXEC

Enables host-based pre-execution processing at the queue level.

Syntax

HOST_PRE_EXEC=command

Description

The HOST_PRE_EXEC command runs on all execution hosts before the job starts. If job based pre-execution PRE_EXEC was defined at the queue-level/application-level/job-level, the HOST_PRE_EXEC command runs before PRE_EXEC of any level.

Host-based pre-execution commands can be configured at the queue and application level, and run in the following order:
  1. The queue-level command
  2. The application-level command.

The supported command rule is the same as the existing PRE_EXEC for the queue section. See the PRE_EXEC topic for details.

Note: The host-based pre-execution command cannot be executed on Windows platforms. This parameter cannot be used to configure job-based pre-execution processing.

Default

Not defined.

HOSTLIMIT_PER_JOB

Specifies the per-job host limit.

Syntax

HOSTLIMIT_PER_JOB=integer

Description

This parameter defines the maximum number of hosts that a job in this queue can use. LSF verifies the host limit during the allocation phase of scheduling. If the number of hosts requested for a parallel job exceeds this limit and LSF cannot satisfy the minimum number of request slots, the parallel job will pend. However, for resumed parallel jobs, this parameter does not stop the job from resuming even if the job's host allocation exceeds the per-job host limit specified in this parameter.

Default

Unlimited

HOSTS

Specifies a space-separated list of hosts on which jobs from this queue can be run.

Syntax

HOSTS=host_list | none
  • host_list is a space-separated list of the following items:
    • host_name[@cluster_name][[!] | +pref_level]
    • host_partition[+pref_level]
    • host_group[[!] | +pref_level]
    • compute_unit[[!] | +pref_level]
    • [~]host_name
    • [~]host_group
    • [~]compute_unit
  • The list can include the following items only once:
    • all@cluster_name
    • others[+pref_level]
    • all
    • allremote
    Note: The allremote and all@cluster_name keywords are deprecated and might be removed in a future version of LSF.
  • The none keyword is only used with the MultiCluster job forwarding model, to specify a remote-only queue.

Description

If compute units, host groups, or host partitions are included in the list, the job can run on any host in the unit, group, or partition. All the members of the host list should either belong to a single host partition or not belong to any host partition. Otherwise, job scheduling may be affected.

Some items can be followed by a plus sign (+) and a positive number to indicate the preference for dispatching a job to that host. A higher number indicates a higher preference. If a host preference is not given, it is assumed to be 0. If there are multiple candidate hosts, LSF dispatches the job to the host with the highest preference; hosts at the same level of preference are ordered by load.

If compute units, host groups, or host partitions are assigned a preference, each host in the unit, group, or partition has the same preference.

Use the keyword others to include all hosts not explicitly listed.

Use the keyword all to include all hosts not explicitly excluded.

Use the keyword all@cluster_name hostgroup_name or allremote hostgroup_name to include lease in hosts.

Use the not operator (~) to exclude hosts from the all specification in the queue. This is useful if you have a large cluster but only want to exclude a few hosts from the queue definition.

The not operator can only be used with the all keyword. It is not valid with the keywords others and none.

The not operator (~) can be used to exclude host groups.

For parallel jobs, specify first execution host candidates when you want to ensure that a host has the required resources or runtime environment to handle processes that run on the first execution host.

To specify one or more hosts, host groups, or compute units as first execution host candidates, add the exclamation point (!) symbol after the name.

Follow these guidelines when you specify first execution host candidates:
  • If you specify a compute unit or host group, you must first define the unit or group in the file lsb.hosts.
  • Do not specify a dynamic host group as a first execution host.
  • Do not specify all, allremote, or others, or a host partition as a first execution host.
  • Do not specify a preference (+) for a host identified by (!) as a first execution host candidate.
  • For each parallel job, specify enough regular hosts to satisfy the CPU requirement for the job. Once LSF selects a first execution host for the current job, the other first execution host candidates
    • Become unavailable to the current job
    • Remain available to other jobs as either regular or first execution hosts
  • You cannot specify first execution host candidates when you use the brun command.
Restriction: If you have enabled EGO, host groups and compute units are not honored.

With MultiCluster resource leasing model, use the format host_name@cluster_name to specify a borrowed host. LSF does not validate the names of remote hosts. The keyword others indicates all local hosts not explicitly listed. The keyword all indicates all local hosts not explicitly excluded. Use the keyword allremote to specify all hosts borrowed from all remote clusters. Use all@cluster_name to specify the group of all hosts borrowed from one remote cluster. You cannot specify a host group or partition that includes remote resources, unless it uses the keyword allremote to include all remote hosts. You cannot specify a compute unit that includes remote resources.

With MultiCluster resource leasing model, the not operator (~) can be used to exclude local hosts or host groups. You cannot use the not operator (~) with remote hosts.

Restriction: Hosts that participate in queue-based fairshare cannot be in a host partition.

Behavior with host intersection

Host preferences specified by bsub -m combine intelligently with the queue specification and advance reservation hosts. The jobs run on the hosts that are both specified at job submission and belong to the queue or have advance reservation.

Example 1

HOSTS=hostA+1 hostB hostC+1 hostD+3

This example defines three levels of preferences: run jobs on hostD as much as possible, otherwise run on either hostA or hostC if possible, otherwise run on hostB. Jobs should not run on hostB unless all other hosts are too busy to accept more jobs.

Example 2

HOSTS=hostD+1 others

Run jobs on hostD as much as possible, otherwise run jobs on the least-loaded host available.

Example 3

HOSTS=all ~hostA

Run jobs on all hosts in the cluster, except for hostA.

Example 4

HOSTS=Group1 ~hostA hostB hostC

Run jobs on hostB, hostC, and all hosts in Group1 except for hostA.

Example 5

HOSTS=hostA! hostB+ hostC hostgroup1!

Runs parallel jobs using either hostA or a host defined in hostgroup1 as the first execution host. If the first execution host cannot run the entire job due to resource requirements, runs the rest of the job on hostB. If hostB is too busy to accept the job, or if hostB does not have enough resources to run the entire job, runs the rest of the job on hostC.

Example 6

HOSTS=computeunit1! hostB hostC

Runs parallel jobs using a host in computeunit1 as the first execution host. If the first execution host cannot run the entire job due to resource requirements, runs the rest of the job on other hosts in computeunit1 followed by hostB and finally hostC.

Example 7

HOSTS=hostgroup1! computeunitA computeunitB computeunitC

Runs parallel jobs using a host in hostgroup1 as the first execution host. If additional hosts are required, runs the rest of the job on other hosts in the same compute unit as the first execution host, followed by hosts in the remaining compute units in the order they are defined in the lsb.hosts ComputeUnit section.

Default

all (the queue can use all hosts in the cluster, and every host has equal preference)

IGNORE_DEADLINE

Disables deadline constraint scheduling.

Syntax

IGNORE_DEADLINE=Y

Description

If set to Y, LSF ignores deadline constraint scheduling and starts all jobs regardless of deadline constraints.

IMPT_JOBBKLG

Specifies the MultiCluster pending job limit for a receive-jobs queue. Used only with the MultiCluster job forwarding model.

Syntax

IMPT_JOBBKLG=integer |infinit

Description

This parameter represents the maximum number of MultiCluster jobs that can be pending in the queue; once the limit has been reached, the queue stops accepting jobs from remote clusters.

Use the keyword infinit to make the queue accept an unlimited number of pending MultiCluster jobs.

Default

50

IMPT_TASKBKLG

Specifies the MultiCluster pending job task limit for a receive-jobs queue. Used only with the MultiCluster job forwarding model.

Syntax

IMPT_TASKBKLG=integer |infinit

Description

In the submission cluster, if the total of requested job tasks and the number of imported pending tasks in the receiving queue is greater than IMPT_TASKBKLG, the queue stops accepting jobs from remote clusters, and the job is not forwarded to the receiving queue.

Specify an integer between 0 and 2147483646 for the number of tasks.

Use the keyword infinit to make the queue accept an unlimited number of pending MultiCluster job tasks.

Set IMPT_TASKBKLG to 0 to forbid any job being forwarded to the receiving queue.

Note: IMPT_SLOTBKLG has been changed to IMPT_TASKBKLG and the concept has changed from slot to task as of LSF 9.1.3,

Default

infinit (The queue accepts an unlimited number of pending MultiCluster job tasks.)

INTERACTIVE

Specifies whether the queue accepts interactive or non-interactive jobs.

Syntax

INTERACTIVE=YES | NO | ONLY

Description

If set to YES, causes the queue to accept both interactive and non-interactive batch jobs.

If set to NO, causes the queue to reject interactive batch jobs.

If set to ONLY, causes the queue to accept interactive batch jobs and reject non-interactive batch jobs.

Interactive batch jobs are submitted via bsub -I.

Default

YES. The queue accepts both interactive and non-interactive jobs.

INTERRUPTIBLE_BACKFILL

Configures interruptible backfill scheduling policy, which allows reserved job slots to be used by low priority small jobs that are terminated when the higher priority large jobs are about to start.

Syntax

INTERRUPTIBLE_BACKFILL=seconds

Description

There can only be one interruptible backfill queue.It should be the lowest priority queue in the cluster.

Specify the minimum number of seconds for the job to be considered for backfilling.This minimal time slice depends on the specific job properties; it must be longer than at least one useful iteration of the job. Multiple queues may be created if a site has jobs of distinctively different classes.

An interruptible backfill job:
  • Starts as a regular job and is killed when it exceeds the queue runtime limit, or
  • Is started for backfill whenever there is a backfill time slice longer than the specified minimal time, and killed before the slot-reservation job is about to start

The queue RUNLIMIT corresponds to a maximum time slice for backfill, and should be configured so that the wait period for the new jobs submitted to the queue is acceptable to users. 10 minutes of runtime is a common value.

You should configure REQUEUE_EXIT_VALUES for interruptible backfill queues.

BACKFILL and RUNLIMIT must be configured in the queue. The queue is disabled if BACKFILL and RUNLIMIT are not configured.

Assumptions and limitations:

  • The interruptible backfill job holds the slot-reserving job start until its calculated start time, in the same way as a regular backfill job. The interruptible backfill job are not preempted in any way other than being killed when its time come.
  • While the queue is checked for the consistency of interruptible backfill, backfill and runtime specifications, the requeue exit value clause is not verified, nor executed automatically. Configure requeue exit values according to your site policies.
  • The interruptible backfill job must be able to do at least one unit of useful calculations and save its data within the minimal time slice, and be able to continue its calculations after it has been restarted
  • Interruptible backfill paradigm does not explicitly prohibit running parallel jobs, distributed across multiple nodes; however, the chance of success of such job is close to zero.

Default

Not defined. No interruptible backfilling.

JOB_ACCEPT_INTERVAL

Specifies the amount of time to wait after dispatching a job to a host before dispatching a second job to the same host.

Syntax

JOB_ACCEPT_INTERVAL=integer

Description

This parameter value is multiplied by the value of lsb.params MBD_SLEEP_TIME (60 seconds by default). The result of the calculation is the number of seconds to wait after dispatching a job to a host before dispatching a second job to the same host.

If set to 0 (zero), a host may accept more than one job in each dispatch turn. By default, there is no limit to the total number of jobs that can run on a host, so if this parameter is set to 0, a very large number of jobs might be dispatched to a host all at once. This can overload your system to the point that it is unable to create any more processes. It is not recommended to set this parameter to 0.

JOB_ACCEPT_INTERVAL set at the queue level (lsb.queues) overrides JOB_ACCEPT_INTERVAL set at the cluster level (lsb.params).

Note:

The parameter JOB_ACCEPT_INTERVAL only applies when there are running jobs on a host. A host running a short job which finishes before JOB_ACCEPT_INTERVAL has elapsed is free to accept a new job without waiting.

Default

Not defined. The queue uses JOB_ACCEPT_INTERVAL defined in lsb.params, which has a default value of 1.

JOB_ACTION_WARNING_TIME

Specifies the amount of time before a job control action occurs that a job warning action is to be taken.

Syntax

JOB_ACTION_WARNING_TIME=[hour:]minute

Description

Job action warning time is not normalized.

A job action warning time must be specified with a job warning action (the JOB_WARNING_ACTION parameter) in order for job warning to take effect.

The warning time specified by the bsub -wt option overrides JOB_ACTION_WARNING_TIME in the queue. JOB_ACTION_WARNING_TIME is used as the default when no command line option is specified.

Example

JOB_ACTION_WARNING_TIME=2
JOB_WARNING_ACTION=URG

2 minutes before the job reaches runtime limit or termination deadline, or the queue's run window is closed, an URG signal is sent to the job.

Default

Not defined

JOB_CONTROLS

Changes the behavior of the SUSPEND, RESUME, and TERMINATE actions in LSF.

Syntax

JOB_CONTROLS=SUSPEND[signal | command | CHKPNT] RESUME[signal | command] TERMINATE[signal | command | CHKPNT]
  • signal is a UNIX signal name (for example, SIGTSTP or SIGTERM). The specified signal is sent to the job. The same set of signals is not supported on all UNIX systems. To display a list of the symbolic names of the signals (without the SIG prefix) supported on your system, use the kill -l command.
  • command specifies a /bin/sh command line to be invoked.
    Restriction:

    Do not quote the command line inside an action definition. Do not specify a signal followed by an action that triggers the same signal. For example, do not specify JOB_CONTROLS=TERMINATE[bkill] or JOB_CONTROLS=TERMINATE[brequeue]. This causes a deadlock between the signal and the action.

  • CHKPNT is a special action, which causes the system to checkpoint the job. Only valid for SUSPEND and TERMINATE actions:
    • If the SUSPEND action is CHKPNT, the job is checkpointed and then stopped by sending the SIGSTOP signal to the job automatically.
    • If the TERMINATE action is CHKPNT, then the job is checkpointed and killed automatically.

Description

  • The contents of the configuration line for the action are run with /bin/sh -c so you can use shell features in the command.
  • The standard input, output, and error of the command are redirected to the NULL device, so you cannot tell directly whether the command runs correctly. The default null device on UNIX is /dev/null.
  • The command is run as the user of the job.
  • All environment variables set for the job are also set for the command action. The following additional environment variables are set:
    • LSB_JOBPGIDS: a list of current process group IDs of the job
    • LSB_JOBPIDS: a list of current process IDs of the job
  • For the SUSPEND action command, the following environment variables are also set:
    • LSB_SUSP_REASONS - an integer representing a bitmap of suspending reasons as defined in lsbatch.h. The suspending reason can allow the command to take different actions based on the reason for suspending the job.
    • LSB_SUSP_SUBREASONS - an integer representing the load index that caused the job to be suspended. When the suspending reason SUSP_LOAD_REASON (suspended by load) is set in LSB_SUSP_REASONS, LSB_SUSP_SUBREASONS set to one of the load index values defined in lsf.h. Use LSB_SUSP_REASONS and LSB_SUSP_SUBREASONS together in your custom job control to determine the exact load threshold that caused a job to be suspended.
  • If an additional action is necessary for the SUSPEND command, that action should also send the appropriate signal to the application. Otherwise, a job can continue to run even after being suspended by LSF. For example, JOB_CONTROLS=SUSPEND[kill $LSB_JOBPIDS; command]
  • If the job control command fails, LSF retains the original job status.
  • If you set preemption with the signal SIGTSTP you use IBM Spectrum LSF License Scheduler, define LIC_SCHED_PREEMPT_STOP=Y in lsf.conf for License Scheduler preemption to work.
Note: When you use blaunch to run parallel jobs on multiple hosts, job control actions defined in JOB_CONTROLS in lsb.queues only take effect on the first execution host. Job control actions defined in the queue do no affect tasks running on other hosts. If JOB_CONTROLS is defined, the default job control signals of LSF (SUSPEND, RESUME, TERMINATE) do not reach each task on each execution host.

Default

On UNIX, by default, SUSPEND sends SIGTSTP for parallel or interactive jobs and SIGSTOP for other jobs. RESUME sends SIGCONT. TERMINATE sends SIGINT, SIGTERM and SIGKILL in that order.

On Windows, actions equivalent to the UNIX signals have been implemented to do the default job control actions. Job control messages replace the SIGINT and SIGTERM signals, but only customized applications are able to process them. Termination is implemented by the TerminateProcess( ) system call.

JOB_IDLE

Specifies a threshold for idle job exception handling.

Syntax

JOB_IDLE=number

Description

The value should be a number between 0.0 and 1.0 representing CPU time/runtime. If the job idle factor is less than the specified threshold, LSF invokes LSF_SERVERDIR/eadmin to trigger the action for a job idle exception.

The minimum job run time before mbatchd reports that the job is idle is defined as DETECT_IDLE_JOB_AFTER in lsb.params.

Valid values

Any positive number between 0.0 and 1.0

Example

JOB_IDLE=0.10

A job idle exception is triggered for jobs with an idle value (CPU time/runtime) less than 0.10.

Default

Not defined. No job idle exceptions are detected.

JOB_OVERRUN

Specifies a threshold for job overrun exception handling.

Syntax

JOB_OVERRUN=run_time

Description

If a job runs longer than the specified run time, LSF invokes LSF_SERVERDIR/eadmin to trigger the action for a job overrun exception.

Example

JOB_OVERRUN=5

A job overrun exception is triggered for jobs running longer than 5 minutes.

Default

Not defined. No job overrun exceptions are detected.

JOB_SIZE_LIST

Specifies a list of job sizes (number of tasks) that are allowed on this queue.

Syntax

JOB_SIZE_LIST=default_size [size ...]

Description

When submitting a job or modifying a pending job that requests a job size by using the -n or -R options for bsub and bmod, the requested job size must be a single fixed value that matches one of the values that JOB_SIZE_LIST specifies, which are the job sizes that are allowed on this queue. LSF rejects the job if the requested job size is not in this list. In addition, when using bswitch to switch a pending job with a requested job size to another queue, the requested job size in the pending job must also match one of the values in JOB_SIZE_LIST for the new queue.

The first value in this list is the default job size, which is the assigned job size request if the job was submitted without requesting one. The remaining values are the other job sizes allowed in the queue, and may be defined in any order.

When defined in both a queue and an application profile (lsb.applications), the job size request must satisfy both requirements. In addition, JOB_SIZE_LIST overrides any TASKLIMIT parameters defined at the same level. Job size requirements do not apply to queues and application profiles with no job size lists, nor do they apply to other levels of job submissions (that is, host level or cluster level job submissions).

Note: An exclusive job may allocate more slots on the host then is required by the tasks. For example, if JOB_SIZE_LIST=8 and an exclusive job requesting -n8 runs on a 16 slot host, all 16 slots are assigned to the job. The job runs as expected, since the 8 tasks specified for the job matches the job size list.

Valid values

A space-separated list of positive integers between 1 and 2147483646.

Default

Undefined

JOB_STARTER

Creates a specific environment for submitted jobs prior to execution.

Syntax

JOB_STARTER=starter [starter] ["%USRCMD"] [starter]

Description

starter is any executable that can be used to start the job (i.e., can accept the job as an input argument). Optionally, additional strings can be specified.

By default, the user commands run after the job starter. A special string, %USRCMD, can be used to represent the position of the user’s job in the job starter command line. The %USRCMD string and any additional commands must be enclosed in quotation marks (" ").

If your job starter script runs on a Windows execution host and includes symbols (like & or |), you can use the JOB_STARTER_EXTEND=preservestarter parameter in lsf.conf and set JOB_STARTER=preservestarter in lsb.queues. A customized userstarter can also be used.

Example

JOB_STARTER=csh -c "%USRCMD;sleep 10"
In this case, if a user submits a job
% bsub myjob arguments
the command that actually runs is:
% csh -c "myjob arguments;sleep 10"

Default

Not defined. No job starter is used.

JOB_UNDERRUN

Specifies a threshold for job underrun exception handling.

Syntax

JOB_UNDERRUN=run_time

Description

If a job exits before the specified number of minutes, LSF invokes LSF_SERVERDIR/eadmin to trigger the action for a job underrun exception.

Example

JOB_UNDERRUN=2

A job underrun exception is triggered for jobs running less than 2 minutes.

Default

Not defined. No job underrun exceptions are detected.

JOB_WARNING_ACTION

Specifies the job action to be taken before a job control action occurs.

Syntax

JOB_WARNING_ACTION=signal

Description

A job warning action must be specified with a job action warning time (the JOB_ACTION_WARNING_TIME parameter) in order for job warning to take effect.

If JOB_WARNING_ACTION is specified, LSF sends the warning action to the job before the actual control action is taken. This allows the job time to save its result before being terminated by the job control action.

The warning action specified by the bsub -wa option overrides JOB_WARNING_ACTION in the queue. JOB_WARNING_ACTION is used as the default when no command line option is specified.

Example

JOB_ACTION_WARNING_TIME=2
JOB_WARNING_ACTION=URG

2 minutes before the job reaches runtime limit or termination deadline, or the queue's run window is closed, an URG signal is sent to the job.

Default

Not defined

LOAD_INDEX

Specifies scheduling and suspending thresholds for the specified dynamic load index.

Syntax

load_index=loadSched[/loadStop]

Specify io, it, ls, mem, pg, r15s, r1m, r15m, swp, tmp, ut, or a non-shared custom external load index. Specify multiple lines to configure thresholds for multiple load indices.

Specify io, it, ls, mem, pg, r15s, r1m, r15m, swp, tmp, ut, or a non-shared custom external load index as a column. Specify multiple columns to configure thresholds for multiple load indices.

Description

The loadSched condition must be satisfied before a job is dispatched to the host. If a RESUME_COND is not specified, the loadSched condition must also be satisfied before a suspended job can be resumed.

If the loadStop condition is satisfied, a job on the host is suspended.

The loadSched and loadStop thresholds permit the specification of conditions using simple AND/OR logic. Any load index that does not have a configured threshold has no effect on job scheduling.

LSF does not suspend a job if the job is the only batch job running on the host and the machine is interactively idle (it>0).

The r15s, r1m, and r15m CPU run queue length conditions are compared to the effective queue length as reported by lsload -E, which is normalized for multiprocessor hosts. Thresholds for these parameters should be set at appropriate levels for single processor hosts.

Example

MEM=100/10 

SWAP=200/30
These two lines translate into a loadSched condition of
mem>=100 && swap>=200 
and a loadStop condition of
mem < 10 || swap < 30

Default

Not defined

LOCAL_MAX_PREEXEC_RETRY

Specifies the maximum number of times to attempt the pre-execution command of a job on the local cluster.

Syntax

LOCAL_MAX_PREEXEC_RETRY=integer

Description

When this limit is reached, the default behavior of the job is defined by the LOCAL_MAX_PREEXEC_RETRY_ACTION parameter in lsb.params, lsb.queues, or lsb.applications.

Valid values

0 < MAX_PREEXEC_RETRY < INFINIT_INT

INFINIT_INT is defined in lsf.h.

Default

Not defined. The number of preexec retry times is unlimited

See also

LOCAL_MAX_PREEXEC_RETRY_ACTION in lsb.params, lsb.queues, and lsb.applications.

LOCAL_MAX_PREEXEC_RETRY_ACTION

The default behavior of a job when it reaches the maximum number of times to attempt its pre-execution command on the local cluster.

Syntax

LOCAL_MAX_PREEXEC_RETRY_ACTION=SUSPEND | EXIT

Description

This parameter specifies the default behavior of a job when it reaches the maximum number of times to attempt its pre-execution command on the local cluster (as specified by LOCAL_MAX_PREEXEC_RETRY in lsb.params, lsb.queues, or lsb.applications).

  • If set to SUSPEND, the local or leased job is suspended and its status is set to PSUSP
  • If set to EXIT, the local or leased job exits and its status is set to EXIT. The job exits with the same exit code as the last pre-execution fail exit code.

This parameter is configured cluster-wide (lsb.params), at the queue level (lsb.queues), and at the application level (lsb.applications). The action specified in lsb.applications overrides lsb.queues, and lsb.queues overrides the lsb.params configuration.

Default

Not defined. If not defined in lsb.params, the default action is SUSPEND.

See also

LOCAL_MAX_PREEXEC_RETRY in lsb.params, lsb.queues, and lsb.applications.

MANDATORY_EXTSCHED

Specifies mandatory external scheduling options for the queue.

Syntax

MANDATORY_EXTSCHED=external_scheduler_options

Description

-extsched options on the bsub command are merged with MANDATORY_EXTSCHED options, and MANDATORY_EXTSCHED options override any conflicting job-level options set by -extsched.

Default

Not defined

MAX_JOB_PREEMPT

Specifies the maximum number of times a job can be preempted. Applies to queue-based preemption only.

Syntax

MAX_JOB_PREEMPT=integer

Valid values

0 < MAX_JOB_PREEMPT < INFINIT_INT

INFINIT_INT is defined in lsf.h.

Default

Not defined. The number of preemption times is unlimited.

MAX_JOB_REQUEUE

Specifies the maximum number of times to requeue a job automatically.

Syntax

MAX_JOB_REQUEUE=integer

Valid values

0 < MAX_JOB_REQUEUE < INFINIT_INT

INFINIT_INT is defined in lsf.h.

Default

Not defined. The number of requeue times is unlimited

MAX_PREEXEC_RETRY

Use REMOTE_MAX_PREEXEC_RETRY instead. This parameter is maintained for backwards compatibility. The maximum number of times to attempt the pre-execution command of a job from a remote cluster. Used only with the MultiCluster job forwarding model.

Syntax

MAX_PREEXEC_RETRY=integer

Description

If the job's pre-execution command fails all attempts, the job is returned to the submission cluster.

Valid values

0 < MAX_PREEXEC_RETRY < INFINIT_INT

INFINIT_INT is defined in lsf.h.

Default

5

MAX_PROTOCOL_INSTANCES

For LSF IBM Parallel Environment (PE) integration. Specifies the number of parallel communication paths (windows) available to the protocol on each network.

Syntax

MAX_PROTOCOL_INSTANCES=integer

Description

Note: This parameter is deprecated and might be removed in a future version of LSF.

If number of windows specified for the job (with the instances option of bsub -network or the NETWORK_REQ parameter in lsb.queues or lsb.applications), or it is greater than the specified maximum value, LSF rejects the job.

Specify MAX_PROTOCOL_INSTANCES in a queue (lsb.queues) or cluster-wide in lsb.params. The value specified in a queue overrides the value specified in lsb.params.

LSF_PE_NETWORK_NUM must be defined to a non-zero value in lsf.conf for MAX_PROTOCOL_INSTANCES to take effect and for LSF to run PE jobs. If LSF_PE_NETWORK_NUM is not defined or is set to 0, the value of MAX_PROTOCOL_INSTANCES is ignored with a warning message.

For best performance, set MAX_PROTOCOL_INSTANCES so that the communication subsystem uses every available adapter before it reuses any of the adapters.

Default

No default value

MAX_RSCHED_TIME

Determines how long a MultiCluster job stays pending in the execution cluster before returning to the submission cluster. Used only with the MultiCluster job forwarding model.

Syntax

MAX_RSCHED_TIME=integer | infinit

Description

The remote timeout limit in seconds is:
MAX_RSCHED_TIME * MBD_SLEEP_TIME=timeout

Specify infinit to disable remote timeout (jobs always get dispatched in the correct FCFS order because MultiCluster jobs never get rescheduled, but MultiCluster jobs can be pending in the receive-jobs queue forever instead of being rescheduled to a better queue).

Note:

apply to the queue in the submission cluster (only). This parameter is ignored by the receiving queue.

Remote timeout limit never affects advance reservation jobs

Jobs that use an advance reservation always behave as if remote timeout is disabled.

Default

20 (20 minutes by default)

MAX_SLOTS_IN_POOL

Queue-based fairshare only. Specifies the maximum number of job slots available in the slot pool the queue belongs to for queue based fairshare.

Syntax

MAX_SLOTS_IN_POOL=integer

Description

Note: This parameter is deprecated and might be removed in a future version of LSF.

Defined in the first queue of the slot pool. Definitions in subsequent queues have no effect.

When defined together with other slot limits (QJOB_LIMIT, HJOB_LIMIT or UJOB_LIMIT in lsb.queues or queue limits in lsb.resources) the lowest limit defined applies.

When MAX_SLOTS_IN_POOL, SLOT_RESERVE, and BACKFILL are defined for the same queue, jobs in the queue cannot backfill using slots reserved by other jobs in the same queue.

Valid values

MAX_SLOTS_IN_POOL can be any number from 0 to INFINIT_INT, where INFINIT_INT is defined in lsf.h.

Default

Not defined

MAX_TOTAL_TIME_PREEMPT

Specifies the accumulated preemption time after which a job cannot be preempted again.

Syntax

MAX_TOTAL_TIME_PREEMPT=minutes

where minutes is wall-clock time, not normalized time.

Description

Setting the parameter of the same name in lsb.applications overrides this parameter; setting this parameter overrides the parameter of the same name in lsb.params.

Valid values

Any positive integer greater than or equal to one (1)

Default

Unlimited

MC_FORWARD_DELAY

When using the LSF multicluster capability, specifies the job forwarding behavior and the amount of time after job submission and scheduling for LSF revert to the default job forwarding behavior.

Syntax

MC_FORWARD_DELAY=[-]seconds

Description

If this value is positive, LSF does not forward the job to a remote cluster and only attempts the local hosts for the specified amount of time in seconds.

If this value is negative, LSF forwards the job to a remote cluster and does not attempt the local hosts for the specified amount of time in seconds.

This specified delay time starts after LSF submitted and scheduled the job. After this amount of time, this parameter no longer takes effect and LSF reverts to the default job forwarding behavior, which is to attempt the local hosts first, then forward the job to a remote cluster if this failed.

LSF repeats these steps if LSF recalled the job from remote clusters due to the MAX_RSCHED_TIME parameter setting. LSF also repeats these steps if LSF requeued, suspended, or resumed the job, or if the scheduler daemon restarts.

Valid values

Any positive or negative integer. If set to 0, this parameter is disabled.

Default

0. This parameter is disabled.

MEMLIMIT

Specifies the per-process resident size limit for all job processes from this queue.

Syntax

MEMLIMIT=[default_limit] maximum_limit

Description

Set this parameter to place a per-process hard process resident set size limit, in KB, for all of the processes belonging to a job from this queue (see getrlimit(2)).

Sets the maximum amount of physical memory (resident set size, RSS) that may be allocated to a process.

By default, if a default memory limit is specified, jobs submitted to the queue without a job-level memory limit are killed when the default memory limit is reached.

If you specify only one limit, it is the maximum, or hard, memory limit. If you specify two limits, the first one is the default, or soft, memory limit, and the second one is the maximum memory limit.

LSF has two methods of enforcing memory usage:
  • OS Memory Limit Enforcement
  • LSF Memory Limit Enforcement

OS memory limit enforcement

OS memory limit enforcement is the default MEMLIMIT behavior and does not require further configuration. OS enforcement usually allows the process to eventually run to completion. LSF passes MEMLIMIT to the OS that uses it as a guide for the system scheduler and memory allocator. The system may allocate more memory to a process if there is a surplus. When memory is low, the system takes memory from and lowers the scheduling priority (re-nice) of a process that has exceeded its declared MEMLIMIT. Only available on systems that support RLIMIT_RSS for setrlimit().

Not supported on:
  • Sun Solaris 2.x
  • Windows

LSF memory limit enforcement

To enable LSF memory limit enforcement, set LSB_MEMLIMIT_ENFORCE in lsf.conf to y. LSF memory limit enforcement explicitly sends a signal to kill a running process once it has allocated memory past MEMLIMIT.

You can also enable LSF memory limit enforcement by setting LSB_JOB_MEMLIMIT in lsf.conf to y. The difference between LSB_JOB_MEMLIMIT set to y and LSB_MEMLIMIT_ENFORCE set to y is that with LSB_JOB_MEMLIMIT, only the per-job memory limit enforced by LSF is enabled. The per-process memory limit enforced by the OS is disabled. With LSB_MEMLIMIT_ENFORCE set to y, both the per-job memory limit enforced by LSF and the per-process memory limit enforced by the OS are enabled.

Available for all systems on which LSF collects total memory usage.

Example

The following configuration defines a queue with a memory limit of 5000 KB:
Begin Queue 
QUEUE_NAME  = default 
DESCRIPTION = Queue with memory limit of 5000 kbytes 
MEMLIMIT    = 5000 
End Queue

Default

Unlimited

MIG

Enables automatic job migration and specifies the migration threshold for checkpointable or rerunnable jobs.

Syntax

MIG=minutes

Description

LSF automatically migrates jobs that have been in the SSUSP state for more than the specified number of minutes. Specify a value of 0 to migrate jobs immediately upon suspension. The migration threshold applies to all jobs running on the host.

Job-level command line migration threshold overrides threshold configuration in application profile and queue. Application profile configuration overrides queue level configuration.

When a host migration threshold is specified, and is lower than the value for the job, the queue, or the application, the host value is used..

Members of a chunk job can be migrated. Chunk jobs in WAIT state are removed from the job chunk and put into PEND state.

Does not affect MultiCluster jobs that are forwarded to a remote cluster.

Default

Not defined. LSF does not migrate checkpointable or rerunnable jobs automatically.

NETWORK_REQ

For LSF IBM Parallel Environment (PE) integration. Specifies the network resource requirements for a PE job.

Syntax

NETWORK_REQ="network_res_req"

network_res_req has the following syntax:

[type=sn_all | sn_single] [:protocol=protocol_name[(protocol_number)][,protocol_name[(protocol_number)]] [:mode=US | IP] [:usage=dedicated | shared] [:instance=positive_integer]

Description

Note: This parameter is deprecated and might be removed in a future version of LSF.

If any network resource requirement is specified in the job, queue, or application profile, the job is treated as a PE job. PE jobs can only run on hosts where IBM PE pnsd daemon is running.

The network resource requirement string network_res_req has the same syntax as the bsub -network option.

The -network bsub option overrides the value of NETWORK_REQ defined in lsb.queues or lsb.applications. The value of NETWORK_REQ defined in lsb.applications overrides queue-level NETWORK_REQ defined in lsb.queues.

The following IBM LoadLeveller job command file options are not supported in LSF:
  • collective_groups
  • imm_send_buffers
  • rcxtblocks
The following network resource requirement options are supported:
type=sn_all | sn_single
Specifies the adapter device type to use for message passing: either sn_all or sn_single.
sn_single

When used for switch adapters, specifies that all windows are on a single network

sn_all

Specifies that one or more windows are on each network, and that striped communication should be used over all available switch networks. The networks specified must be accessible by all hosts selected to run the PE job. See the Parallel Environment Runtime Edition for AIX: Operation and Use guide (SC23-6781-05) for more information about submitting jobs that use striping.

If mode is IP and type is specified as sn_all or sn_single, the job will only run on InfiniBand (IB) adapters (IPoIB). If mode is IP and type is not specified, the job will only run on Ethernet adapters (IPoEth). For IPoEth jobs, LSF ensures the job is running on hosts where pnsd is installed and running. For IPoIB jobs, LSF ensures the job the job is running on hosts where pnsd is installed and running, and that IB networks are up. Because IP jobs do not consume network windows, LSF does not check if all network windows are used up or the network is already occupied by a dedicated PE job.

Equivalent to the PE MP_EUIDEVICE environment variable and -euidevice PE flag See the Parallel Environment Runtime Edition for AIX: Operation and Use guide (SC23-6781-05) for more information. Only sn_all or sn_single are supported by LSF. The other types supported by PE are not supported for LSF jobs.

protocol=protocol_name[(protocol_number)]
Network communication protocol for the PE job, indicating which message passing API is being used by the application. The following protocols are supported by LSF:
mpi

The application makes only MPI calls. This value applies to any MPI job regardless of the library that it was compiled with (PE MPI, MPICH2).

pami

The application makes only PAMI calls.

lapi

The application makes only LAPI calls.

shmem

The application makes only OpenSHMEM calls.

user_defined_parallel_api

The application makes only calls from a parallel API that you define. For example: protocol=myAPI or protocol=charm.

The default value is mpi.

LSF also supports an optional protocol_number (for example, mpi(2), which specifies the number of contexts (endpoints) per parallel API instance. The number must be a power of 2, but no greater than 128 (1, 2, 4, 8, 16, 32, 64, 128). LSF will pass the communication protocols to PE without any change. LSF will reserve network windows for each protocol.

When you specify multiple parallel API protocols, you cannot make calls to both LAPI and PAMI (lapi, pami) or LAPI and OpenSHMEM (lapi, shmem) in the same application. Protocols can be specified in any order.

See the MP_MSG_API and MP_ENDPOINTS environment variables and the -msg_api and -endpoints PE flags in the Parallel Environment Runtime Edition for AIX: Operation and Use guide (SC23-6781-05) for more information about the communication protocols that are supported by IBM Parallel Edition.

mode=US | IP

The network communication system mode used by the communication specified communication protocol: US (User Space) or IP (Internet Protocol). A US job can only run with adapters that support user space communications, such as the IB adapter. IP jobs can run with either Ethernet adapters or IB adapters. When IP mode is specified, the instance number cannot be specified, and network usage must be unspecified or shared.

Each instance on the US mode requested by a task running on switch adapters requires and adapter window. For example, if a task requests both the MPI and LAPI protocols such that both protocol instances require US mode, two adapter windows will be used.

The default value is US.

usage=dedicated | shared

Specifies whether the adapter can be shared with tasks of other job steps: dedicated or shared. Multiple tasks of the same job can share one network even if usage is dedicated.

The default usage is shared.

instances=positive_integer

The number of parallel communication paths (windows) per task made available to the protocol on each network. The number actually used depends on the implementation of the protocol subsystem.

The default value is 1.

If the specified value is greater than MAX_PROTOCOL_INSTANCES in lsb.params or lsb.queues, LSF rejects the job.

LSF_PE_NETWORK_NUM must be defined to a non-zero value in lsf.conf for NETWORK_REQ to take effect. If LSF_PE_NETWORK_NUM is not defined or is set to 0, NETWORK_REQ is ignored with a warning message.

Example

The following network resource requirement string specifies that the requirements for an sn_all job (one or more windows are on each network, and striped communication should be used over all available switch networks). The PE job uses MPI API calls (protocol), runs in user-space network communication system mode, and requires 1 parallel communication path (window) per task.

NETWORK_REQ = "protocol=mpi:mode=us:instance=1:type=sn_all"

Default

No default value, but if you specify no value (NETWORK_REQ=""), the job uses the following: protocol=mpi:mode=US:usage=shared:instance=1 in the queue.

NEW_JOB_SCHED_DELAY

Specifies the number of seconds that a new job waits, before being scheduled.

Syntax

NEW_JOB_SCHED_DELAY=seconds

Description

A value of zero (0) means the job is scheduled without any delay. The scheduler still periodically fetches jobs from mbatchd. Once it gets jobs, scheduler schedules them without any delay. This may speed up job scheduling a bit, but it also generates some communication overhead. Therefore, you should only set it to 0 for high priority, urgent or interactive queues for a small workloads.

If NEW_JOB_SCHED_DELAY is set to a non-zero value, scheduler will periodically fetch new jobs from mbatchd, after which it sets job scheduling time to job submission time + NEW_JOB_SCHED_DELAY.

Default

0 seconds

NICE

Adjusts the UNIX scheduling priority at which jobs from this queue execute.

Syntax

NICE=integer

Description

The default value of 0 (zero) maintains the default scheduling priority for UNIX interactive jobs. This value adjusts the run-time priorities for batch jobs on a queue-by-queue basis, to control their effect on other batch or interactive jobs. See the nice(1) manual page for more details.

On Windows, this value is mapped to Windows process priority classes as follows:
  • nice>=0 corresponds to an priority class of IDLE
  • nice<0 corresponds to an priority class of NORMAL

LSF on Windows does not support HIGH or REAL-TIME priority classes.

This value is overwritten by the NICE setting in lsb.applications, if defined.

Default

0 (zero)

NO_PREEMPT_INTERVAL

Specifies the number of minutes a preemptable job can run before it is preempted. If the uninterrupted run time of a preemptable job is longer than the specified time, it can be preempted.

Syntax

NO_PREEMPT_INTERVAL=minutes

The value of minutes is wall-clock time, not normalized time.

Description

The NO_PREEMPT_INTERVAL=0 parameter allows immediate preemption of jobs as soon as they start or resume running.

For example, if a job A needs to preempt other candidate preemptable jobsB, C, and D, the NO_PREEMPT_INTERVAL parameter determines which job is preempted:
  • Run time of job B and job C is less than the NO_PREEMPT_INTERVAL parameter: job B and C are not preempted.
  • Run time of job D is greater than or equal to the NO_PREEMPT_INTERVAL parameter: job D is preempted.

The parameter of the same name in the lsb.applications file overrides this parameter. This parameter overrides the parameter of the same name in the lsb.params file.

Default

0

PEND_TIME_LIMIT

Specifies the pending time limit for a job.

Syntax

PEND_TIME_LIMIT=[hour:]minute

Description

LSF sends the queue-level pending time limit configuration to IBM Spectrum LSF RTM (LSF RTM), which handles the alarm and triggered actions such as user notification (for example, notifying the user that submitted the job and the LSF administrator) and job control actions (for example, killing the job). LSF RTM compares the job's pending time to the pending time limit, and if the job is pending for longer than this specified time limit, LSF RTM triggers the alarm and actions. This parameter works without LSF RTM, but LSF does not take any other alarm actions.

In MultiCluster job forwarding mode, the job's pending time limit is ignored in the execution cluster, while the submission cluster merges the job's queue-, application-, and job-level pending time limit according to local settings.

The pending time limit is in the form of [hour:]minute. The minutes can be specified as a number greater than 59. For example, three and a half hours can either be specified as 3:30, or 210.

The job-level pending time limit (bsub -ptl) overrides the application-level limit (PEND_TIME_LIMIT in lsb.applications), and the application-level limit overrides the queue-level limit specified here.

Default

Not defined.

PJOB_LIMIT

Specifies the per-processor job slot limit for the queue.

Syntax

PJOB_LIMIT=float

Description

Maximum number of job slots that this queue can use on any processor. This limit is configured per processor, so that multiprocessor hosts automatically run more jobs.

Default

Unlimited

PLAN

For use when the ALLOCATION_PLANNER parameter is enabled. Used to identify the jobs that are candidates for planning.

Syntax

PLAN = Y | N | "<key>[value] ..."

Description

LSF requires that the ALLOCATION_PLANNER parameter is enabled in order to use PLAN=Y.

Also defined at the cluster and application levels. The precedence is: application, queue, global. For example, application level setting overrides the queue level setting.

The following key-value pairs are supported:

Table 1. Key-Value pairs for PLAN
key value Default Description
DELAY positive integer - Number of minutes to delay before considering making a plan for a job following the job's submission time.
MAX_JOBS positive integer - Maximum number of jobs that can have a plan concurrently.
Note:

The PLAN parameter replaces the existing SLOT_RESERVE parameter and RESOURCE_RESERVE parameter when the ALLOCATION_PLANNER parameter is enabled.

Default

N

POST_EXEC

Enables post-execution processing at the queue level.

Syntax

POST_EXEC=command

Description

The POST_EXEC command runs on the execution host after the job finishes. Post-execution commands can be configured at the application and queue levels. Application-level post-execution commands run before queue-level post-execution commands.

The POST_EXEC command uses the same environment variable values as the job, and, by default, runs under the user account of the user who submits the job. To run post-execution commands under a different user account (such as root for privileged operations), configure the parameter LSB_PRE_POST_EXEC_USER in lsf.sudoers.

When a job exits with one of the queue’s REQUEUE_EXIT_VALUES, LSF requeues the job and sets the environment variable LSB_JOBPEND. The post-execution command runs after the requeued job finishes.

When the post-execution command is run, the environment variable LSB_JOBEXIT_STAT is set to the exit status of the job. If the execution environment for the job cannot be set up, LSB_JOBEXIT_STAT is set to 0 (zero).

The command path can contain up to 4094 characters for UNIX and Linux, or up to 255 characters for Windows, including the directory, file name, and expanded values for %J (job_ID) and %I (index_ID).

For UNIX:
  • The pre- and post-execution commands run in the /tmp directory under /bin/sh -c, which allows the use of shell features in the commands. The following example shows valid configuration lines:
    PRE_EXEC= /usr/share/lsf/misc/testq_pre >> /tmp/pre.out
    
    POST_EXEC= /usr/share/lsf/misc/testq_post | grep -v "Hey!"
    
  • LSF sets the PATH environment variable to
    PATH='/bin /usr/bin /sbin /usr/sbin'
    
  • The stdin, stdout, and stderr are set to /dev/null
  • To allow UNIX users to define their own post-execution commands, an LSF administrator specifies the environment variable $USER_POSTEXEC as the POST_EXEC command. A user then defines the post-execution command:
    setenv USER_POSTEXEC /path_name
    
    Note: The path name for the post-execution command must be an absolute path. Do not define POST_EXEC=$USER_POSTEXEC when LSB_PRE_POST_EXEC_USER=root. This parameter cannot be used to configure host-based post-execution processing.
For Windows:
  • The pre- and post-execution commands run under cmd.exe /c
  • The standard input, standard output, and standard error are set to NULL
  • The PATH is determined by the setup of the LSF service
Note:

For post-execution commands that execute on a Windows Server 2003, x64 Edition platform, users must have read and execute privileges for cmd.exe.

Default

Not defined. No post-execution commands are associated with the queue.

PRE_EXEC

Enables pre-execution processing at the queue level.

Syntax

PRE_EXEC=command

Description

The PRE_EXEC command runs on the execution host before the job starts. If the PRE_EXEC command exits with a non-zero exit code, LSF requeues the job to the front of the queue.

Pre-execution commands can be configured at the queue, application, and job levels and run in the following order:
  1. The queue-level command
  2. The application-level or job-level command. If you specify a command at both the application and job levels, the job-level command overrides the application-level command; the application-level command is ignored.

The PRE_EXEC command uses the same environment variable values as the job, and runs under the user account of the user who submits the job. To run pre-execution commands under a different user account (such as root for privileged operations), configure the parameter LSB_PRE_POST_EXEC_USER in lsf.sudoers.

The command path can contain up to 4094 characters for UNIX and Linux, or up to 255 characters for Windows, including the directory, file name, and expanded values for %J (job_ID) and %I (index_ID).

For UNIX:
  • The pre- and post-execution commands run in the /tmp directory under /bin/sh -c, which allows the use of shell features in the commands. The following example shows valid configuration lines:
    PRE_EXEC= /usr/share/lsf/misc/testq_pre >> /tmp/pre.out
    
    POST_EXEC= /usr/share/lsf/misc/testq_post | grep -v "Hey!"
    
  • LSF sets the PATH environment variable to
    PATH='/bin /usr/bin /sbin /usr/sbin'
    
  • The stdin, stdout, and stderr are set to /dev/null
For Windows:
  • The pre- and post-execution commands run under cmd.exe /c
  • The standard input, standard output, and standard error are set to NULL
  • The PATH is determined by the setup of the LSF Service
Note:

For pre-execution commands that execute on a Windows Server 2003, x64 Edition platform, users must have read and execute privileges for cmd.exe. This parameter cannot be used to configure host-based pre-execution processing.

Default

Not defined. No pre-execution commands are associated with the queue.

PREEMPTION

Enables preemptive scheduling and defines this queue as preemptive or preemptable.

Syntax

PREEMPTION=PREEMPTIVE[[low_queue_name[+pref_level]...]] PREEMPTION=PREEMPTABLE[[hi_queue_name...]] PREEMPTION=PREEMPTIVE[[low_queue_name[+pref_level]...]] PREEMPTABLE[[hi_queue_name...]]

Description

PREEMPTIVE

Enables preemptive scheduling and defines this queue as preemptive. Jobs in this queue preempt jobs from the specified lower-priority queues or from all lower-priority queues if the parameter is specified with no queue names. PREEMPTIVE can be combined with PREEMPTABLE to specify that jobs in this queue can preempt jobs in lower-priority queues, and can be preempted by jobs in higher-priority queues.

PREEMPTABLE

Enables preemptive scheduling and defines this queue as preemptable. Jobs in this queue can be preempted by jobs from specified higher-priority queues, or from all higher-priority queues, even if the higher-priority queues are not preemptive. PREEMPTIVE can be combined with PREEMPTIVE to specify that jobs in this queue can be preempted by jobs in higher-priority queues, and can preempt jobs in lower-priority queues.

low_queue_name

Specifies the names of lower-priority queues that can be preempted.

To specify multiple queues, separate the queue names with a space, and enclose the list in a single set of square brackets.

+pref_level

Specifies to preempt this queue before preempting other queues. When multiple queues are indicated with a preference level, an order of preference is indicated: queues with higher relative preference levels are preempted before queues with lower relative preference levels set.

hi_queue_name

Specifies the names of higher-priority queues that can preempt jobs in this queue.

To specify multiple queues, separate the queue names with a space and enclose the list in a single set of square brackets.

Example: configure selective, ordered preemption across queues

The following example defines four queues, as follows:
  • high
    • Has the highest relative priority of 99
    • Jobs from this queue can preempt jobs from all other queues
  • medium
    • Has the second-highest relative priority at 10
    • Jobs from this queue can preempt jobs from normal and low queues, beginning with jobs from low, as indicated by the preference (+1)
  • normal
    • Has the second-lowest relative priority, at 5
    • Jobs from this queue can preempt jobs from low, and can be preempted by jobs from both high and medium queues
  • low
    • Has the lowest relative priority, which is also the default priority, at 1
    • Jobs from this queue can be preempted by jobs from all preemptive queues, even though it does not have the PREEMPTABLE keyword set
Begin Queue
QUEUE_NAME=high
PREEMPTION=PREEMPTIVE
PRIORITY=99
End Queue
Begin Queue
QUEUE_NAME=medium
PREEMPTION=PREEMPTIVE[normal low+1]
PRIORITY=10
End Queue
Begin Queue
QUEUE_NAME=normal
PREEMPTION=PREEMPTIVE[low]
PREEMPTABLE[high medium]
PRIORITY=5
End Queue
Begin Queue
QUEUE_NAME=low
PRIORITY=1
End Queue

PREEMPT_DELAY

Preemptive jobs will wait the specified number of seconds from the submission time before preempting any low priority preemptable jobs.

Syntax

PREEMPT_DELAY=seconds

Description

During the grace period, preemption will not be trigged, but the job can be scheduled and dispatched by other scheduling policies.

This feature can provide flexibility to tune the system to reduce the number of preemptions. It is useful to get better performance and job throughput. When the low priority jobs are short, if high priority jobs can wait a while for the low priority jobs to finish, preemption can be avoided and cluster performance is improved. If the job is still pending after the grace period has expired, the preemption will be triggered.

The waiting time is for preemptive jobs in the pending status only. It will not impact the preemptive jobs that are suspended.

The time is counted from the submission time of the jobs. The submission time means the time mbatchd accepts a job, which includes newly submitted jobs, restarted jobs (by brestart) or forwarded jobs from a remote cluster.

When the preemptive job is waiting, the pending reason is:

The preemptive job is allowing a grace period before preemption.

If you use an older version of bjobs, the pending reason is:

Unknown pending reason code <6701>;

The parameter is defined in lsb.params, lsb.queues (overrides lsb.params), and lsb.applications (overrides both lsb.params and lsb.queues).

Run badmin reconfig to make your changes take effect.

Default

Not defined (if the parameter is not defined anywhere, preemption is immediate).

PRIORITY

Specifies the relative queue priority for dispatching jobs. A higher value indicates a higher job-dispatching priority, relative to other queues.

Syntax

PRIORITY=integer

Description

LSF schedules jobs from one queue at a time, starting with the highest-priority queue. If multiple queues have the same priority, LSF schedules all the jobs from these queues in first-come, first-served order.

LSF queue priority is independent of the UNIX scheduler priority system for time-sharing processes. In LSF, the NICE parameter is used to set the UNIX time-sharing priority for batch jobs.

integer

Specify a number greater than or equal to 1, where 1 is the lowest priority.

Default

1

PROCESSLIMIT

Limits the number of concurrent processes that can be part of a job.

Syntax

PROCESSLIMIT=[default_limit] maximum_limit

Description

By default, if a default process limit is specified, jobs submitted to the queue without a job-level process limit are killed when the default process limit is reached.

If you specify only one limit, it is the maximum, or hard, process limit. If you specify two limits, the first one is the default, or soft, process limit, and the second one is the maximum process limit.

Default

Unlimited

QJOB_LIMIT

Specifies the job slot limit for the queue.

Syntax

QJOB_LIMIT=integer

Description

This parameter specifies the total number of job slots that this queue can use.

Default

Unlimited

QUEUE_GROUP

Configures absolute priority scheduling (APS) across multiple queues.

Syntax

QUEUE_GROUP=queue1, queue2 ...

Description

When APS is enabled in the queue with APS_PRIORITY, the FAIRSHARE_QUEUES parameter is ignored. The QUEUE_GROUP parameter replaces FAIRSHARE_QUEUES, which is obsolete in LSF 7.0.

Default

Not defined

QUEUE_NAME

Required. Specifies the name of the queue.

Syntax

QUEUE_NAME=string

Description

Specify any ASCII string up to 59 characters long. You can use letters, digits, underscores (_) or dashes (-). You cannot use blank spaces. You cannot specify the reserved name default.

Default

You must specify this parameter to define a queue. The default queue automatically created by LSF is named default.

RC_ACCOUNT

Assigns an account name (tag) to hosts borrowed through LSF resource connector, so that they cannot be used by other user groups, users, or jobs.

Syntax

RC_ACCOUNT=account_name

Description

When a job is submitted to a queue with the RC_ACCOUNT parameter specified, hosts borrowed to run the job are tagged with the value of the RC_ACCOUNT parameter. The borrowed host cannot be used by other queues that have a different value for the RC_ACCOUNT parameter (or that don't have the RC_ACCOUNT parameter defined).

After the borrowed host joins the cluster, use the lshosts -s command to view the value of the RC_ACCOUNT parameter for the host.

Example

RC_ACCOUNT=project1

Default

The string "default" - Meaning, no account is defined for the queue.

RC_DEMAND_POLICY

Defines threshold conditions for the determination of whether demand is triggered to borrow resources through resource connector for all the jobs in a queue. As long as pending jobs at the queue meet at least one threshold condition, LSF expresses the demand to resource connector to trigger borrowing.

Syntax

RC_DEMAND_POLICY = THRESHOLD[ [ num_pend_jobs[,duration]]]

Description

The demand policy defined by the RC_DEMAND_POLICY parameter can contain multiple conditions, in an OR relationship. A condition is defined as [ num_pend_jobs[,duration]]. The queue has more than the specified number of eligible pending jobs that are expected to run at least the specified duration in minutes. The num_pend_jobs option is required, and the duration is optional. The default duration is 0 minutes.

LSF considers eligible pending jobs for the policy. An ineligible pending job (for example, a job dependency is not satisfied yet) keeps pending even thought hosts are available. The policy counts a job for eligibility no matter how many tasks or slots the job requires. Each job element is counted as a job. Pending demand for a resizable job is not counted, though LSF can allocate borrowed resources to the resizable job.

LSF evaluates the policies at each demand calculation cycle, and accumulates duration if the num_pend_jobs option is satisfied. The mbschd daemon resets the duration of the condition when it restarts or if the condition has not been evaluated in the past 2 minutes. For example, if no pending jobs are in the cluster, for 2 minutes, mbschd stops evaluating them.

Example

In the following example, LSF calculates demand if the queue has 5 or more pending jobs in past 10 minutes, or 1 or more pending jobs in past 60 minutes, or 100 or more pending jobs.
RC_DEMAND_POLICY = THRESHOLD[ [ 5, 10] [1, 60] [100] ]

Default

Not defined for the queue

RC_HOSTS

Enables LSF resource connector to borrow specific host types from a resource provider.

Syntax

RC_HOSTS=string

RC_HOSTS = none | all | host_type [host_type ...]

Description

The host_type flag is a Boolean resource that is a member of the list of host resources that are defined in the LSB_RC_EXTERNAL_HOST_FLAG parameter in the lsf.conf file.

If the RC_HOSTS parameter is not defined in the queue, its default value is none. Borrowing is disabled for any queue that explicitly defines RC_HOSTS=none, even if the LSB_RC_EXTERNAL_HOST_FLAG parameter is defined in the lsf.conf file.

If the RC_HOSTS parameter is not defined in any queue, borrowing cannot happen for any job.

Note: The HOSTS parameter in the lsb.queues file and the bsub -m option do not apply to hosts that are managed through the resource connector. To specify the resource connector host types that can be used by a queue, you must specify the RC_HOSTS parameter in that queue.

Example

RC_HOSTS=awshost

Default

none - host borrowing from resource providers is disabled, and no borrowed hosts can be used by the queue.

RCVJOBS_FROM

Defines a receive-jobs queue for LSF multicluster capability.

Syntax

RCVJOBS_FROM=cluster_name ... | allclusters

Description

Specify cluster names, separated by a space. The administrator of each remote cluster determines which queues in that cluster forward jobs to the local cluster.

If you enabled an LSF data manager data transfer queue as a remote send-jobs queue in the execution cluster (that is, if you added a queue from the submission cluster to the SNDJOBS_TO parameter in the lsb.queues file in the execution cluster), you must include the execution cluster in the RCVJOBS_FROM parameter in the corresponding submission cluster.

Use the keyword allclusters to specify any remote cluster.

Example

The following queue accepts remote jobs from clusters 2, 4, and 6.

RCVJOBS_FROM=cluster2 cluster4 cluster6

Example for LSF data manager

If the execution cluster clusterE has the LSF data manager transfer queue data_transfer set as a remote send-jobs queue to the receive_q queue in the submission cluster clusterS, according to the following configuration for the clusterE cluster:

Begin Queue
QUEUE_NAME=data_transfer
DATA_TRANSFER=Y
SNDJOBS_TO=receive_q@clusterS
HOSTS=hostS1 hostS2 # Transfer nodes in the execution cluster
End Queue

You must define the RCVJOBS_FROM parameter for the receive_q queue in the submission cluster clusterS to accept jobs from (and push data to) the execution cluster clusterE, as shown in the following configuration for the clusterS cluster:

Begin Queue
QUEUE_NAME=receive_q
RCVJOBS_FROM=clusterE
PRIORITY=40
HOSTS=hostS1 hostS2 # Transfer nodes in the submission cluster
RES_REQ=select[type==any]
End Queue

Alternatively, you can define RCVJOBS_FROM=allclusters to accept jobs from all clusters, which includes the execution cluster.

RELAX_JOB_DISPATCH_ORDER

Allows LSF to deviate from standard job prioritization policies to improve cluster utilization.

Syntax

RELAX_JOB_DISPATCH_ORDER=Y | y | N | n | ALLOC_REUSE_DURATION[[min] max] [SHARE[[user] [group] [project]]]

Description

When this parameter is enabled, LSF allows multiple jobs with common resource requirements to run consecutively on the same allocation. Whenever a job finishes, LSF attempts to quickly replace it with a pending job that has the same resource requirements. To ensure that limits are not violated, LSF selects pending jobs that belong to the same user and have other attributes in common.

Since LSF bypasses most of the standard scheduling logic between jobs, reusing resource allocation can help improve cluster utilization. This improvement is most evident in clusters with several shorter jobs (that is, jobs that run from a few seconds to several minutes).

To ensure that the standard job prioritization policies are approximated, there is a limit on the length of time that each allocation is reusable. LSF automatically sets this time limit to achieve a high level of resource utilization. By default, this reuse time cannot exceed 30 minutes. If you specify a maximum reuse time and an optional minimum reuse time (by using ALLOC_REUSE_DURATION), LSF adjusts the time limit within this specified range to achieve the highest level of resource utilization.

Use the SHARE[] keyword to further relax the constraints of what types of pending jobs can reuse the resource allocation of a finished job. This allows more jobs to reuse the resource allocation, but might result in resource limits and policies being temporarily violated because these limits and policies are relaxed. The SHARE[] keyword specifies constraints that the mbatchd daemon no longer has to apply when determining which pending jobs can reuse the resource allocation of a finished job. If a job is finished and LSF does not find any pending jobs with the same user or other attributes in common, LSF considers the specifications in the SHARE[] keyword. Specify one or more of the following keywords within SHARE[] for LSF to also consider the following pending jobs:
user
Pending jobs that do not have the same job owner as the finished job.
group
Pending jobs that are not associated with the same fairshare group (bsub -G command option) as the finished job.
project
Pending jobs that are not assigned to the same project (bsub -P command option) as the finished job.
If using the LSF multicluster capability, SHARE[] applies only to the job forward mode.

If undefined, the cluster-wide value from the lsb.params parameter of the same name is used.

Examples

  • RELAX_JOB_DISPATCH_ORDER=Y
    

    The resource allocation of a finished job can be reused from 0 to 30 minutes.

  • RELAX_JOB_DISPATCH_ORDER=ALLOC_REUSE_DURATION[45]

    The resource allocation of a finished job can be reused from 0 to 45 minutes.

  • RELAX_JOB_DISPATCH_ORDER=ALLOC_REUSE_DURATION[5 45]

    The resource allocation of a finished job can be reused from 5 to 45 minutes.

  • RELAX_JOB_DISPATCH_ORDER=SHARE[user]

    The resource allocation of a finished job can be reused from 0 to 30 minutes. If there are no pending jobs with the same common attributes, pending jobs that belong to different users can also reuse the resource allocation.

  • RELAX_JOB_DISPATCH_ORDER=SHARE[user group]

    The resource allocation of a finished job can be reused from 0 to 30 minutes. If there are no pending jobs with the same common attributes, pending jobs that belong to different users and are associated with different fairshare groups can also reuse the resource allocation.

  • RELAX_JOB_DISPATCH_ORDER=ALLOC_REUSE_DURATION[45] SHARE[user group]

    The resource allocation of a finished job can be reused from 0 to 45 minutes. If there are no pending jobs with the same common attributes, pending jobs that belong to different users and are associated with different fairshare groups can also reuse the resource allocation.

Default

Not defined.

REMOTE_MAX_PREEXEC_RETRY

Define the maximum number of times to attempt the pre-execution command of a job from the remote cluster. Used only with the MultiCluster job forwarding model.

Syntax

REMOTE_MAX_PREEXEC_RETRY=integer

Description

This parameter applies to the execution cluster.

Valid values

0 - INFINIT_INT

INFINIT_INT is defined in lsf.h.

Default

5

REQUEUE_EXIT_VALUES

Enables automatic job requeue and sets the LSB_EXIT_REQUEUE environment variable.

Syntax

REQUEUE_EXIT_VALUES=[exit_code ...] [EXCLUDE(exit_code ...)]

Description

Use spaces to separate multiple exit codes. Application-level exit values override queue-level values. Job-level exit values (bsub -Q) override application-level and queue-level values.

exit_code has the following form:
"[all] [~number ...] | [number ...]"

The reserved keyword all specifies all exit codes. Exit codes are typically between 0 and 255. Use a tilde (~) to exclude specified exit codes from the list.

Jobs are requeued to the head of the queue. The output from the failed run is not saved, and the user is not notified by LSF.

Define an exit code as EXCLUDE(exit_code) to enable exclusive job requeue, ensuring the job does not rerun on the samehost. Exclusive job requeue does not work for parallel jobs.

For MultiCluster jobs forwarded to a remote execution cluster, the exit values specified in the submission cluster with the EXCLUDE keyword are treated as if they were non-exclusive.

You can also requeue a job if the job is terminated by a signal.

If a job is killed by a signal, the exit value is 128+signal_value. The sum of 128 and the signal value can be used as the exit code in the parameter REQUEUE_EXIT_VALUES.

For example, if you want a job to rerun if it is killed with a signal 9 (SIGKILL), the exit value would be 128+9=137. You can configure the following requeue exit value to allow a job to be requeue if it was kill by signal 9:

REQUEUE_EXIT_VALUES=137

In Windows, if a job is killed by a signal, the exit value is signal_value. The signal value can be used as the exit code in the parameter REQUEUE_EXIT_VALUES.

For example, if you want to rerun a job after it was killed with a signal 7 (SIGKILL), the exit value would be 7. You can configure the following requeue exit value to allow a job to requeue after it was killed by signal 7:

REQUEUE_EXIT_VALUES=7

You can configure the following requeue exit value to allow a job to requeue for both Linux and Windows after it was killed:

REQUEUE_EXIT_VALUES=137 7

If mbatchd is restarted, it does not remember the previous hosts from which the job exited with an exclusive requeue exit code. In this situation, it is possible for a job to be dispatched to hosts on which the job has previously exited with an exclusive exit code.

You should configure REQUEUE_EXIT_VALUES for interruptible backfill queues (INTERRUPTIBLE_BACKFILL=seconds).

Example

REQUEUE_EXIT_VALUES=30 EXCLUDE(20)

means that jobs with exit code 30 are requeued, jobs with exit code 20 are requeued exclusively, and jobs with any other exit code are not requeued.

Default

Not defined. Jobs are not requeued.

RERUNNABLE

Enables automatic rerun for jobs from this queue.

Syntax

RERUNNABLE=yes | no

Description

If set to yes, enables automatic job rerun (restart).

Rerun is disabled when RERUNNABLE is set to no. The yes and no arguments are not case sensitive.

For MultiCluster jobs, the setting in the submission queue is used, and the setting in the execution queue is ignored.

Members of a chunk job can be rerunnable. If the execution host becomes unavailable, rerunnable chunk job members are removed from the job chunk and dispatched to a different execution host.

Default

no

RESOURCE_RESERVE

Enables processor reservation and memory reservation for pending jobs for this queue.

Syntax

RESOURCE_RESERVE=MAX_RESERVE_TIME[integer]

Description

Specifies the number of dispatch turns (MAX_RESERVE_TIME) over which a job can reserve job slots and memory.

Overrides the SLOT_RESERVE parameter. If both RESOURCE_RESERVE and SLOT_RESERVE are defined in the same queue, an error is displayed when the cluster is reconfigured, and SLOT_RESERVE is ignored. Job slot reservation for parallel jobs is enabled by RESOURCE_RESERVE if the LSF scheduler plug-in module names for both resource reservation and parallel batch jobs (schmod_parallel and schmod_reserve) are configured in the lsb.modules file: The schmod_parallel name must come before schmod_reserve in lsb.modules.

If a job has not accumulated enough memory or job slots to start by the time MAX_RESERVE_TIME expires, it releases all its reserved job slots or memory so that other pending jobs can run. After the reservation time expires, the job cannot reserve memory or slots for one scheduling session, so other jobs have a chance to be dispatched. After one scheduling session, the job can reserve available memory and job slots again for another period specified by MAX_RESERVE_TIME.

If BACKFILL is configured in a queue, and a run limit is specified with -W on bsub or with RUNLIMIT in the queue, backfill jobs can use the accumulated memory reserved by the other jobs in the queue, as long as the backfill job can finish before the predicted start time of the jobs with the reservation.

Unlike slot reservation, which only applies to parallel jobs, memory reservation and backfill on memory apply to sequential and parallel jobs.

Example

RESOURCE_RESERVE=MAX_RESERVE_TIME[5]

This example specifies that jobs have up to 5 dispatch turns to reserve sufficient job slots or memory (equal to 5 minutes, by default).

Default

Not defined. No job slots or memory is reserved.

RES_REQ

Specifies resource requirements used to determine eligible hosts.

Syntax

RES_REQ=res_req

Description

Specify a resource requirement string as usual. The resource requirement string lets you specify conditions in a more flexible manner than using the load thresholds. Resource requirement strings can be simple (applying to the entire job), compound (applying to the specified number of slots) or can contain alternative resources (alternatives between 2 or more simple and/or compound). For alternative resources, if the first resource cannot be found that satisfies the first resource requirement, then the next resource requirement is tried, and so on until the requirement is satisfied.

Compound and alternative resource requirements follow the same set of rules for determining how resource requirements are going to be merged between job, application, and queue level. For more information on merge rules, see Administering IBM Spectrum LSF.

When a compound or alternative resource requirement is set for a queue, it will be ignored unless it is the only resource requirement specified (no resource requirements are set at the job-level or application-level).

When a simple resource requirement is set for a queue and a compound resource requirement is set at the job-level or application-level, the queue-level requirements merge as they do for simple resource requirements. However, any job-based resources defined in the queue only apply to the first term of the merged compound resource requirements.

Resource requirement strings in select sections must conform to a more strict syntax. The strict resource requirement syntax only applies to the select section. It does not apply to the other resource requirement sections (order, rusage, same, span, cu or affinity). LSF rejects resource requirement strings where an rusage section contains a non-consumable resource.

For simple resource requirements, the select sections from all levels must be satisfied and the same sections from all levels are combined. cu, order, and span sections at the job-level overwrite those at the application-level which overwrite those at the queue-level. Multiple rusage definitions are merged, with the job-level rusage taking precedence over the application-level, and application-level taking precedence over the queue-level.

The simple resource requirement rusage section can specify additional requests. To do this, use the OR (||) operator to separate additional rusage strings. Multiple -R options cannot be used with multi-phase rusage resource requirements.

For simple resource requirements the job-level affinity section overrides the application-level, and the application-level affinity section overrides the queue-level.

Note:

Compound and alternative resource requirements do not support use of the || operator within rusage sections or the cu section.

The RES_REQ consumable resource requirements must satisfy any limits set by the parameter RESRSV_LIMIT in lsb.queues, or the RES_REQ will be ignored.

When both the RES_REQ and RESRSV_LIMIT are set in lsb.queues for a consumable resource, the queue-level RES_REQ no longer acts as a hard limit for the merged RES_REQ rusage values from the job and application levels. In this case only the limits set by RESRSV_LIMIT must be satisfied, and the queue-level RES_REQ acts as a default value.

For example:
Queue-level RES_REQ:
RES_REQ=rusage[mem=200:lic=1] ...
For the job submission:
bsub -R'rusage[mem=100]' ...
the resulting requirement for the job is
rusage[mem=100:lic=1]

where mem=100 specified by the job overrides mem=200 specified by the queue. However, lic=1 from queue is kept, since job does not specify it.

Queue-level RES_REQ threshold:
RES_REQ = rusage[bwidth =2:threshold=5] ...
For the job submission:
bsub -R "rusage[bwidth =1:threshold=6]" ...

the resulting requirement for the job is

rusage[bwidth =1:threshold=6]
Queue-level RES_REQ with decay and duration defined:
RES_REQ=rusage[mem=200:duration=20:decay=1] ...
For a job submission with no decay or duration:
bsub -R'rusage[mem=100]' ...
the resulting requirement for the job is:
rusage[mem=100:duration=20:decay=1]

Queue-level duration and decay are merged with the job-level specification, and mem=100 for the job overrides mem=200 specified by the queue. However, duration=20 and decay=1 from queue are kept, since job does not specify them.

Queue-level RES_REQ with resource reservation method:
RES_REQ=rusage[mem=200/host:duration=20:decay=1] ...
For a job submission with no decay or duration:
bsub -R'rusage[mem=100/task]' ...
the resulting requirement for the job is:
rusage[mem=100/task:duration=20:decay=1]
Queue-level RES_REQ with multi-phase job-level rusage:
RES_REQ=rusage[mem=200:duration=20:decay=1] ...
For a job submission with no decay or duration:
bsub -R'rusage[mem=(300 200 100):duration=(10 10 10)]' ...
the resulting requirement for the job is:
rusage[mem=(300 200 100):duration=(10 10 10)]
Multi-phase rusage values in the job submission override the single phase specified by the queue.
  • If RESRSV_LIMIT is defined in lsb.queues and has a maximum memory limit of 300 MB or greater, this job will be accepted.
  • If RESRSV_LIMIT is defined in lsb.queues and has a maximum memory limit of less than 300 MB, this job will be rejected.
  • If RESRSV_LIMIT is not defined in lsb.queues and the queue-level RES_REQ value of 200 MB acts as a ceiling, this job will be rejected.
Queue-level multi-phase rusage RES_REQ:
RES_REQ=rusage[mem=(350 200):duration=(20):decay=(1)] ...
For a single phase job submission with no decay or duration:
bsub -q q_name -R'rusage[mem=100:swap=150]' ...
the resulting requirement for the job is:
rusage[mem=100:swap=150]

The job-level rusage string overrides the queue-level multi-phase rusage string.

The order section defined at the job level overwrites any resource requirements specified at the application level or queue level. The order section defined at the application level overwrites any resource requirements specified at the queue level. The default order string is r15s:pg.

If RES_REQ is defined at the queue level and there are no load thresholds defined, the pending reasons for each individual load index are not displayed by bjobs.

The span section defined at the queue level is ignored if the span section is also defined at the job level or in an application profile.

Note: Define span[hosts=-1] in the application profile or bsub -R resource requirement string to override the span section setting in the queue.

Default

select[type==local] order[r15s:pg]. If this parameter is defined and a host model or Boolean resource is specified, the default type is any.

RESRSV_LIMIT

Sets a range of allowed values for RES_REQ resources.

Syntax

RESRSV_LIMIT=[res1={min1,} max1] [res2={min2,} max2]...

Where res is a consumable resource name, min is an optional minimum value and max is the maximum allowed value. Both max and min must be float numbers between 0 and 2147483647, and min cannot be greater than max.

Description

Queue-level RES_REQ rusage values (set in lsb.queues) must be in the range set by RESRSV_LIMIT, or the queue-level RES_REQ is ignored. Merged RES_REQ rusage values from the job and application levels must be in the range of RESRSV_LIMIT, or the job is rejected.

Changes made to the rusage values of running jobs using bmod -R cannot exceed the maximum values of RESRSV_LIMIT, but can be lower than the minimum values.

When both the RES_REQ and RESRSV_LIMIT are set in lsb.queues for a consumable resource, the queue-level RES_REQ no longer acts as a hard limit for the merged RES_REQ rusage values from the job and application levels. In this case only the limits set by RESRSV_LIMIT must be satisfied, and the queue-level RES_REQ acts as a default value.

For MultiCluster, jobs must satisfy the RESRSV_LIMIT range set for the send-jobs queue in the submission cluster. After the job is forwarded the resource requirements are also checked against the RESRSV_LIMIT range set for the receive-jobs queue in the execution cluster.

Note:

Only consumable resource limits can be set in RESRSV_LIMIT. Other resources will be ignored.

Default

Not defined.

If max is defined and optional min is not, the default for min is 0.

RESUME_COND

Specifies conditions for LSF to automatically resume a suspended (SSUSP) job in this queue.

Syntax

RESUME_COND=res_req

Use the select section of the resource requirement string to specify load thresholds. All other sections are ignored.

Description

LSF automatically resumes a suspended (SSUSP) job in this queue if the load on the host satisfies the specified conditions. The conditions only support load indices and static boolean resources.

If RESUME_COND is not defined, then the loadSched thresholds are used to control resuming of jobs. The loadSched thresholds are ignored, when resuming jobs, if RESUME_COND is defined.

Default

Not defined. The loadSched thresholds are used to control resuming of jobs.

RUN_JOB_FACTOR

Job slots weighting factor. Used only with fairshare scheduling.

Syntax

RUN_JOB_FACTOR=number

Description

In the calculation of a user’s dynamic share priority, this factor determines the relative importance of the number of job slots reserved and in use by a user.

If undefined, the cluster-wide value from the lsb.params parameter of the same name is used.

Default

Not defined.

RUN_TIME_DECAY

Enables decay for run time at the same rate as the decay set by HIST_HOURS for cumulative CPU time and historical run time. Used only with fairshare scheduling.

Syntax

RUN_TIME_DECAY=Y | y | N | n

Description

In the calculation of a user’s dynamic share priority, this factor determines whether run time is decayed.

If undefined, the cluster-wide value from the lsb.params parameter of the same name is used.

Restrictions

Running badmin reconfig or restarting mbatchd during a job's run time results in the decayed run time being recalculated.

When a suspended job using run time decay is resumed, the decay time is based on the elapsed time.

Default

Not defined

RUN_TIME_FACTOR

Specifies the run time weighting factor. Used only with fairshare scheduling.

Syntax

RUN_TIME_FACTOR=number

Description

In the calculation of a user’s dynamic share priority, this factor determines the relative importance of the total run time of a user’s running jobs.

If undefined, the cluster-wide value from the lsb.params parameter of the same name is used.

Default

Not defined.

RUN_WINDOW

Specifies time periods during which jobs in the queue are allowed to run.

Syntax

RUN_WINDOW=time_window ...

Description

When the window closes, LSF suspends jobs running in the queue and stops dispatching jobs from the queue. When the window reopens, LSF resumes the suspended jobs and begins dispatching additional jobs.

Default

Not defined. Queue is always active.

RUNLIMIT

Specifies the maximum run limit and optionally the default run limit. The name of a host or host model specifies the runtime normalization host to use.

Syntax

RUNLIMIT=[default_limit] maximum_limit

where default_limit and maximum_limit are:

[hour:]minute[/host_name | /host_model]

Description

By default, jobs that are in a running state (but not in pre-execution or post-execution) for longer than the specified maximum run limit are killed by LSF. You can optionally provide your own termination job action to override this default.

Jobs submitted with a job-level run limit (bsub -W) that is less than the maximum run limit are killed when their job-level run limit is reached. Jobs submitted with a run limit greater than the maximum run limit are rejected by the queue.

If a default run limit is specified, jobs submitted to the queue without a job-level run limit are killed when the default run limit is reached. The default run limit is used with backfill scheduling of parallel jobs.
Note:

If you want to provide an estimated run time for scheduling purposes without killing jobs that exceed the estimate, define the RUNTIME parameter in an application profile instead of a run limit (see lsb.applications for details).

If you specify only one limit, it is the maximum, or hard, run limit. If you specify two limits, the first one is the default, or soft, run limit, and the second one is the maximum run limit.

The run limit is in the form of [hour:]minute. The minutes can be specified as a number greater than 59. For example, three and a half hours can either be specified as 3:30, or 210.

The run limit you specify is the normalized run time. This is done so that the job does approximately the same amount of processing, even if it is sent to host with a faster or slower CPU. Whenever a normalized run time is given, the actual time on the execution host is the specified time multiplied by the CPU factor of the normalization host then divided by the CPU factor of the execution host.

If ABS_RUNLIMIT=Y is defined in lsb.params, the runtime limit is not normalized by the host CPU factor. Absolute wall-clock run time is used for all jobs submitted to a queue with a run limit configured.

Optionally, you can supply a host name or a host model name defined in LSF. You must insert ‘/’ between the run limit and the host name or model name. (See lsinfo(1) to get host model information.)

If no host or host model is given, LSF uses the default runtime normalization host defined at the queue level (DEFAULT_HOST_SPEC in lsb.queues) if it has been configured; otherwise, LSF uses the default CPU time normalization host defined at the cluster level (DEFAULT_HOST_SPEC in lsb.params) if it has been configured; otherwise, the host with the largest CPU factor (the fastest host in the cluster).

For MultiCluster jobs, if no other CPU time normalization host is defined and information about the submission host is not available, LSF uses the host with the largest CPU factor (the fastest host in the cluster).

Jobs submitted to a chunk job queue are not chunked if RUNLIMIT is greater than 30 minutes.

RUNLIMIT is required for queues configured with INTERRUPTIBLE_BACKFILL.

Default

Unlimited

SLA_GUARANTEES_IGNORE

Applies to SLA guarantees only. Allows jobs in the queue access to all guaranteed resources.

Syntax

SLA_GUARANTEES_IGNORE=Y| y | N | n

Description

SLA_GUARANTEES_IGNORE=Y allows jobs in the queue access to all guaranteed resources. As a result, some guarantees might not be honored. If a queue does not have this parameter set, jobs in this queue cannot trigger preemption of an SLA job. If an SLA job is suspended (e.g. by a bstop), jobs in queues without the parameter set can still make use of the slots released by the suspended job.

Note: Using SLA_GUARANTEES_IGNORE=Y defeats the purpose of guaranteeing resources. This should be used sparingly for low traffic queues only.

Default

Not defined (N). The queue must honor resource guarantees when dispatching jobs.

SLOT_POOL

Specifies the name of the pool of job slots the queue belongs to for queue-based fairshare.

Syntax

SLOT_POOL=pool_name

Description

Note: This parameter is deprecated and might be removed in a future version of LSF.

A queue can only belong to one pool. All queues in the pool must share the same set of hosts.

Valid values

Specify any ASCII string up to 60 characters long. You can use letters, digits, underscores (_) or dashes (-). You cannot use blank spaces.

Default

Not defined. No job slots are reserved.

SLOT_RESERVE

Enables processor reservation for the queue and specifies the reservation time.

Syntax

SLOT_RESERVE=MAX_RESERVE_TIME[integer]

Description

Specify the keyword MAX_RESERVE_TIME and, in square brackets, the number of MBD_SLEEP_TIME cycles over which a job can reserve job slots. MBD_SLEEP_TIME is defined in lsb.params; the default value is 60 seconds.

If a job has not accumulated enough job slots to start before the reservation expires, it releases all its reserved job slots so that other jobs can run. Then, the job cannot reserve slots for one scheduling session, so other jobs have a chance to be dispatched. After one scheduling session, the job can reserve job slots again for another period specified by SLOT_RESERVE.

SLOT_RESERVE is overridden by the RESOURCE_RESERVE parameter.

If both RESOURCE_RESERVE and SLOT_RESERVE are defined in the same queue, job slot reservation and memory reservation are enabled and an error is displayed when the cluster is reconfigured. SLOT_RESERVE is ignored.

Job slot reservation for parallel jobs is enabled by RESOURCE_RESERVE if the LSF scheduler plug-in module names for both resource reservation and parallel batch jobs (schmod_parallel and schmod_reserve) are configured in the lsb.modules file: The schmod_parallel name must come before schmod_reserve in lsb.modules.

If BACKFILL is configured in a queue, and a run limit is specified at the job level (bsub -W), application level (RUNLIMIT in lsb.applications), or queue level (RUNLIMIT in lsb.queues), or if an estimated run time is specified at the application level (RUNTIME in lsb.applications), backfill parallel jobs can use job slots reserved by the other jobs, as long as the backfill job can finish before the predicted start time of the jobs with the reservation.

Unlike memory reservation, which applies both to sequential and parallel jobs, slot reservation applies only to parallel jobs.

Example

SLOT_RESERVE=MAX_RESERVE_TIME[5]

This example specifies that parallel jobs have up to 5 cycles of MBD_SLEEP_TIME (5 minutes, by default) to reserve sufficient job slots to start.

Default

Not defined. No job slots are reserved.

SLOT_SHARE

Specifies the share of job slots for queue-based fairshare.

Syntax

SLOT_SHARE=integer

Description

Note: This parameter is deprecated and might be removed in a future version of LSF.

Represents the percentage of running jobs (job slots) in use from the queue. SLOT_SHARE must be greater than zero (0) and less than or equal to 100.

The sum of SLOT_SHARE for all queues in the pool does not need to be 100%. It can be more or less, depending on your needs.

Default

Not defined

SNDJOBS_TO

Defines a send-jobs queue for IBM Spectrum LSF multicluster capability.

Syntax

SNDJOBS_TO=[queue@]cluster_name[+preference] ...

Description

Specify remote queue names, in the form queue_name@cluster_name[+preference], separated by a space.

This parameter is ignored if lsb.queues HOSTS specifies remote (borrowed) resources.

Queue preference is defined at the queue level in SNDJOBS_TO of the submission cluster for each corresponding execution cluster queue receiving forwarded jobs.

You can enable an LSF data manager data transfer queue as a remote send-jobs queue in the execution cluster.

If you specify a remote queue with the SNDJOBS_TO parameter in the data transfer queue in the execution cluster, the path of the FILE_TRANSFER_CMD parameter must exist in the submission cluster. In addition, the corresponding remote queue in the submission cluster must be configured to receive the data transfer job (that is, the RCVJOBS_FROM parameter value for the remote queue in the submission cluster either includes this execution cluster, or is set to allclusters). This ensures that the submission cluster can push data files back to the execution cluster.

Example

SNDJOBS_TO=queue2@cluster2+1 queue3@cluster2+2

Example for LSF data manager

The following configuration on the execution cluster clusterE has the LSF data manager transfer queue data_transfer set as a remote send-jobs queue to the receive_q queue in the submission cluster clusterS:

Begin Queue
QUEUE_NAME=data_transfer
DATA_TRANSFER=Y
SNDJOBS_TO=receive_q@clusterS
HOSTS=hostS1 hostS2 # Transfer nodes in the execution cluster
End Queue

You must also define the RCVJOBS_FROM parameter for the receive_q queue in the submission cluster clusterS to accept jobs from (and push data to) the execution cluster clusterE, as shown in the following configuration in the clusterS cluster:

Begin Queue
QUEUE_NAME=receive_q
RCVJOBS_FROM=clusterE
PRIORITY=40
HOSTS=hostS1 hostS2 # Transfer nodes in the submission cluster
RES_REQ=select[type==any]
End Queue

Alternatively, you can define RCVJOBS_FROM=allclusters to accept jobs from all clusters, which includes the execution cluster.

STACKLIMIT

Specifies the per-process stack segment size limit for all job processes from this queue.

Syntax

STACKLIMIT=integer

Description

Specify this parameter to place a per-process hard stack segment size limit, in KB, for all of the processes belonging to a job from this queue (see getrlimit(2)).

Default

Unlimited

STOP_COND

Specifies conditions for LSF to automatically suspend a running job in this queue.

Syntax

STOP_COND=res_req

Use the select section of the resource requirement string to specify load thresholds. All other sections are ignored.

Description

LSF automatically suspends a running job in this queue if the load on the host satisfies the specified conditions. The conditions only support load indices and static boolean resources.
  • LSF does not suspend the only job running on the host if the machine is interactively idle (it > 0).
  • LSF does not suspend a forced job (brun -f).
  • LSF does not suspend a job because of paging rate if the machine is interactively idle.

If STOP_COND is specified in the queue and there are no load thresholds, the suspending reasons for each individual load index is not displayed by bjobs.

Example

STOP_COND= select[((!cs && it < 5) || (cs && mem < 15 && swp < 50))]

In this example, assume “cs” is a Boolean resource indicating that the host is a computer server. The stop condition for jobs running on computer servers is based on the availability of swap memory. The stop condition for jobs running on other kinds of hosts is based on the idle time.

SUCCESS_EXIT_VALUES

Specifies exit values used by LSF to determine if the job was done successfully.

Syntax

SUCCESS_EXIT_VALUES=[exit_code ...]

Description

Application level success exit values defined with SUCCESS_EXIT_VALUES in lsb.applications override the configuration defined in lsb.queues. Job-level success exit values specified with the LSB_SUCCESS_EXIT_VALUES environment variable override the configration in lsb.queues and lsb.applications.

Use SUCCESS_EXIT_VALUES for submitting jobs to specific queues that successfully exit with non-zero values so that LSF does not interpret non-zero exit codes as job failure.

If the same exit code is defined in SUCCESS_EXIT_VALUES and REQUEUE_EXIT_VALUES, any job with this exit code is requeued instead of being marked as DONE because sbatchd processes requeue exit values before success exit values.

In MultiCluster job forwarding mode, LSF uses the SUCCESS_EXIT_VALUES from the remote cluster.

In a MultiCluster resource leasing environment, LSF uses the SUCCESS_EXIT_VALUES from the consumer cluster.

exit_code should be a value between 0 and 255. Use spaces to separate multiple exit codes.

Any changes you make to SUCCESS_EXIT_VALUES will not affect running jobs. Only pending jobs will use the new SUCCESS_EXIT_VALUES definitions, even if you run badmin reconfig and mbatchd restart to apply your changes.

Default

Not defined.

SWAPLIMIT

Specifies the amount of total virtual memory limit, in KB, for a job from this queue.

Syntax

SWAPLIMIT=integer

Description

This limit applies to the whole job, no matter how many processes the job may contain.

The action taken when a job exceeds its SWAPLIMIT or PROCESSLIMIT is to send SIGQUIT, SIGINT, SIGTERM, and SIGKILL in sequence. For CPULIMIT, SIGXCPU is sent before SIGINT, SIGTERM, and SIGKILL.

Default

Unlimited

TASKLIMIT

Specifies the maximum number of tasks that can be allocated to a job. For parallel jobs, the maximum number of tasks that can be allocated to the job.

Syntax

TASKLIMIT=[minimum_limit [default_limit]] maximum_limit

Description

Note: TASKLIMIT replaces PROCLIMIT as of LSF 9.1.3.

Queue level TASKLIMIT has the highest priority over application level TASKLIMIT and job level TASKLIMIT. Application level TASKLIMIT has higher priority than job level TASKLIMIT. Job-level limits must fall within the maximum and minimum limits of the application profile and the queue.

Note: If you also defined JOB_SIZE_LIST in the same queue where you defined TASKLIMIT, the TASKLIMIT parameter is ignored.

Optionally specifies the minimum and default number of job tasks.

All limits must be positive numbers greater than or equal to 1 that satisfy the following relationship:

1 <= minimum <= default <= maximum

If RES_REQ in a queue was defined as a compound resource requirement with a block size (span[block=value]), the default value for TASKLIMIT should be a multiple of a block.

For example, this configuration would be accepted:

Queue-level RES_REQ="1*{type==any } + {type==local span[block=4]}"

TASKLIMIT = 5 9 13

This configuration, for example, would not be accepted. An error message will appear when doing badmin reconfig:

Queue-level RES_REQ="1*{type==any } + {type==local span[block=4]}"

TASKCLIMIT = 4 10 12

In the MultiCluster job forwarding model, the local cluster considers the receiving queue's TASKLIMIT on remote clusters before forwarding jobs. If the receiving queue's TASKLIMIT definition in the remote cluster cannot satisfy the job's task requirements for a remote queue, the job is not forwarded to that remote queue in the cluster.

Default

Unlimited, the default number of tasks is 1

TERMINATE_WHEN

Specifies the circumstances under which the queue invokes the TERMINATE action instead of the SUSPEND action.

Syntax

TERMINATE_WHEN=[LOAD] [PREEMPT] [WINDOW]

Description

Configures the queue to invoke the TERMINATE action instead of the SUSPEND action in the specified circumstance:
  • LOAD: kills jobs when the load exceeds the suspending thresholds.
  • PREEMPT: kills jobs that are being preempted.
  • WINDOW: kills jobs if the run window closes.

If the TERMINATE_WHEN job control action is applied to a chunk job, sbatchd kills the chunk job element that is running and puts the rest of the waiting elements into pending state to be rescheduled later.

Example

Set TERMINATE_WHEN to WINDOW to define a night queue that kills jobs if the run window closes:
Begin Queue
NAME           = night
RUN_WINDOW     = 20:00-08:00 EDT
TERMINATE_WHEN = WINDOW
JOB_CONTROLS   = TERMINATE[kill -KILL $LS_JOBPGIDS; mail - s "job $LSB_JOBID 
                 killed by queue run window" $USER < /dev/null]
End Queue

Specifying the time zone is optional. If you do not specify a time zone, LSF uses the local system time zone.

THREADLIMIT

Limits the number of concurrent threads that can be part of a job. Exceeding the limit causes the job to terminate.

Syntax

THREADLIMIT=[default_limit] maximum_limit

Description

The system sends the following signals in sequence to all processes belongs to the job: SIGINT, SIGTERM, and SIGKILL.

By default, if a default thread limit is specified, jobs submitted to the queue without a job-level thread limit are killed when the default thread limit is reached.

If you specify only one limit, it is the maximum, or hard, thread limit. If you specify two limits, the first one is the default, or soft, thread limit, and the second one is the maximum thread limit.

Both the default and the maximum limits must be positive integers. The default limit must be less than the maximum limit. The default limit is ignored if it is greater than the maximum limit.

Examples

THREADLIMIT=6

No default thread limit is specified. The value 6 is the default and maximum thread limit.

THREADLIMIT=6 8

The first value (6) is the default thread limit. The second value (8) is the maximum thread limit.

Default

Unlimited

UJOB_LIMIT

Specifies the per-user job slot limit for the queue.

Syntax

UJOB_LIMIT=integer

Description

This parameter specifies the maximum number of job slots that each user can use in this queue.

UJOB_LIMIT must be within or greater than the range set by TASKLIMIT or bsub -n (if either is used), or jobs are rejected.

Default

Unlimited

USE_PAM_CREDS

Applies PAM limits to this queue.

Syntax

USE_PAM_CREDS=y | n | [limits] [session]

Description

USE_PAM_CREDS is only supported on Linux systems. If the execution host does not have PAM configured and this parameter is enabled, the job fails.

If USE_PAM_CREDS is set to y or limits, LSF can apply PAM limits to a queue when its job is dispatched to a Linux host using PAM. The LSF job does not run within the PAM session.

If USE_PAM_CREDS is set to session:
  • If a job is started on the first execution host, the job RES opens a PAM session for the user and forks a RES process within that session. This RES process becomes the user's job.
  • If a task is launched by the blaunch command or an API, the task RES opens a PAM session for the user and executes a RES process within that session. This RES process becomes the user's task.

The limits keyword can be defined together with the session keyword.

If LSF limits are more restrictive than PAM limits, LSF limits are used, otherwise PAM limits are used. PAM limits are system resource limits defined in the limits.conf file.

For parallel jobs, PAM sessions are only launched on the first execution host if USE_PAM_CREDS=y or USE_PAM_CREDS=limits is defined. PAM sessions are launched on all tasks if USE_PAM_CREDS=session or USE_PAM_CREDS=limits session is defined.

Note: When configuring Linux PAM to be used with LSF, you must configure Linux PAM so that it does not ask users for their passwords because jobs are not usually interactive.
Depending on the USE_PAM_CREDS parameter setting, LSF assumes that the following Linux PAM services are created:
  • If USE_PAM_CREDS is set to y or limits, LSF assumes that the Linux PAM service "lsf" is created.
  • If USE_PAM_CREDS is set to session, LSF assumes that the Linux PAM service "lsf-<clustername>" is created.
  • If USE_PAM_CREDS is set to limits session, LSF assumes that the Linux PAM services "lsf" and "lsf-<clustername>" are created.
It is also assumed that the "lsf" service is used in conjunction with the /etc/security/limits.conf file.

The job sbatchd daemon checks the lsf service, and the job or task RES daemon checks the lsf-<clustername> service.

Overrides MEMLIMIT_TYPE=Process.

Overridden (for CPU limit only) by LSB_JOB_CPULIMIT=y.

Overridden (for memory limits only) by LSB_JOB_MEMLIMIT=y.

The USE_PAM_CREDS value in lsb.applications overrides the USE_PAM_CREDS value in lsb.queues.

Default

n. USE_PAM_CREDS is disabled.

USE_PRIORITY_IN_POOL

Queue-based fairshare only. After job scheduling occurs for each queue, this parameter enables LSF to dispatch jobs to any remaining slots in the pool in first-come first-served order across queues.

Syntax

USE_PRIORITY_IN_POOL= y | Y | n | N

Note: This parameter is deprecated and might be removed in a future version of LSF.

Default

N

USERS

Specifies a space-separated list of user names or user groups that can submit jobs to the queue.

Syntax

USERS=all [~user_name ...] [~user_group ...] | [user_name ...] [user_group [~user_group ...] ...]

Description

LSF cluster administrators are automatically included in the list of users. LSF cluster administrators can submit jobs to this queue, or switch (bswitch) any user’s jobs into this queue.

If user groups are specified, each user in the group can submit jobs to this queue. If FAIRSHARE is also defined in this queue, only users defined by both parameters can submit jobs, so LSF administrators cannot use the queue if they are not included in the share assignments.

User names must be valid login names. To specify a Windows user account, include the domain name in uppercase letters (DOMAIN_NAME\user_name).

User group names can be LSF user groups or UNIX and Windows user groups. To specify a Windows user group, include the domain name in uppercase letters (DOMAIN_NAME\user_group).

Use the keyword all to specify all users or user groups in a cluster.

Use the not operator (~) to exclude users from the all specification or from user groups. This is useful if you have a large number of users but only want to exclude a few users or groups from the queue definition.

The not operator (~) can only be used with the all keyword or to exclude users from user groups.

CAUTION:
The not operator (~) does not exclude LSF administrators from the queue definition.

Default

all (all users can submit jobs to the queue)

Examples

  • USERS=user1 user2
  • USERS=all ~user1 ~user2
  • USERS=all ~ugroup1
  • USERS=groupA ~user3 ~user4

Automatic time-based configuration

Variable configuration is used to automatically change LSF configuration based on time windows.

You define automatic configuration changes in lsb.queues by using if-else constructs and time expressions. After you change the files, reconfigure the cluster with the badmin reconfig command.

The expressions are evaluated by LSF every 10 minutes based on mbatchd start time. When an expression evaluates true, LSF dynamically changes the configuration based on the associated configuration statements. Reconfiguration is done in real time without restarting mbatchd, providing continuous system availability.

Example

Begin Queue
... 
#if time(8:30-18:30 EDT)   
INTERACTIVE  = ONLY  # interactive only during day shift #endif
#endif
... 
End Queue

Specifying the time zone is optional. If you do not specify a time zone, LSF uses the local system time zone. LSF supports all standard time zone abbreviations.