About job submission and execution controls

The job submission and execution controls feature uses the executable files esub and eexec to control job options and the job execution environment.

In addition, the epsub executable files can communicate with external components using job submission information such as job ID and queue name and perform additional logic after job submission.

External submission (esub)

An esub is an executable file that you write to meet the job requirements at your site. The following are some of the things that you can use an esub to do:
  • Validate job options
  • Change the job options that are specified by a user
  • Change user environment variables on the submission host (at job submission only)
  • Reject jobs (at job submission only)
  • Pass data to stdin of eexec
  • Automate job resource requirements
  • Enable data provenance to trace job files

When a user submits a job by using bsub or modifies a job by using bmod, LSF runs the esub executable files on the submission host before the job is accepted. If the user submitted the job with options such as -R to specify required resources or -q to specify a queue, an esub can change the values of those options to conform to resource usage policies at your site.

Note: When compound resource requirements are used at any level, an esub can create job-level resource requirements, which overwrite most application-level and queue-level resource requirements.

An esub can also change the user environment on the submission host before job submission so that when LSF copies the submission host environment to the execution host, the job runs on the execution host with the values specified by the esub. For example, an esub can add user environment variables to those environment variables already associated with the job.

LSF runs the default executable file named "esub" if it exists in the LSF_SERVERDIR directory, followed by any mandatory esub executable files that are defined by LSB_ESUB_METHOD, followed by any application-specific esub executable files (with .application_name in the file name).

External post-submission (epsub)

An epsub is an executable file that you write to meet the post-submission job requirements at your site with information that is not available before job submission. The following are some of the things that you can use an epsub to do with the newly-available job information:

  • Pass job information to an external entity
  • Post job information to a local log file
  • Perform general logic after a job is submitted to LSF

When a user submits a job by using bsub, modifies a job by using bmod, or restarts a job by using brestart, LSF runs the epsub executable files on the submission host immediately after the job is accepted, and the job may or may not have started running while epsub is running.

When submitting interactive jobs, bsub or bmod runs epsub, then resumes regular interactive job behavior (that is, bsub or bmod runs epsub, then runs the interactive job).

epsub does not pass information to eexec, nor does it get information from eexec. epsub can only read information from the temporary file that contains job submission options (as indicated by the LSB_SUB_PARM_FILE environment variable) and from the environment variables. The information that is available to the epsub after job submission includes the following:

  • A temporary file that contains job submission options, which is available through the LSB_SUB_PARM_FILE environment variable. The file that this environment variable specifies is a different file from the one that is initially created by esub before the job submission.
  • The LSF job ID, which is available through the LSB_SUB_JOB_ID environment variable. For job arrays, the job ID includes the job array index.
  • The name of the final queue to which the job is submitted (including any queue modifications made by esub), which is available through the LSB_SUB_JOB_QUEUE environment variable.
  • The LSF job error number if the job submission failed, which is available through the LSB_SUB_JOB_ERR environment variable.

Since epsub is run after job submission, the epsub executable files cannot modify job submission parameters or job environment variables. That is, LSB_SUB_MODIFY_FILE and LSB_SUB_MODIFY_ENVFILE are not available to epsub.

If the esub rejects a job, the corresponding epsub file does not run.

After job submission, bsub or bmod waits for the epsub scripts to finish before returning. If the bsub or bmod return time is crucial, do not use epsub to perform time-consuming activities. In addition, if epsub hangs, bsub or bmod waits indefinitely for the epsub script to finish. This is similar to the esub behavior, because bsub or bmod hangs if an esub script hangs.

LSF runs the default executable file named "epsub" if it exists in the LSF_SERVERDIR directory, followed by any mandatory epsub executable files that are defined by LSB_ESUB_METHOD, followed by any application-specific epsub executable files (with .application_name in the file name).

If a mandatory program specified using the LSB_ESUB_METHOD parameter does not have a corresponding esub executable file (esub.application_name), but has a corresponding epsub executable file (epsub.application_name), the job is submitted normally using the normal external job submission and post-submission mechanisms.

Except for these differences, epsub uses the same framework as esub.

Use of esub or epsub not enabled

With esub or epsub enabled

An esub executable file is typically used to enforce site-specific job submission policies and command line syntax by validating or pre-parsing the command line. The file indicated by the environment variable LSB_SUB_PARM_FILE stores the values that are submitted by the user. An esub reads the LSB_SUB_PARM_FILE and then accepts or changes the option values or rejects the job. Because an esub runs before job submission, using an esub to reject incorrect job submissions improves overall system performance by reducing the load on the management batch daemon (mbatchd).

An esub can be used for the following purposes:
  • Reject any job that requests more than a specified number of CPUs
  • Change the submission queue for specific user accounts to a higher priority queue
  • Check whether the job specifies an application and, if so, submit the job to the correct application profile
Note: If an esub executable file fails, the job is still submitted to LSF.

Multiple esub executable files

LSF provides a parent external submission executable file (LSF_SERVERDIR/mesub) that supports the use of application-specific esub executable files. Users can specify one or more esub executable files by using the -a option of bsub or bmod. When a user submits or modifies a job or when a user restarts a job that was submitted or modified with the -a option included, mesub runs the specified esub executable files.

An LSF administrator can specify one or more mandatory esub executable files by defining the parameter LSB_ESUB_METHOD in lsf.conf. If a mandatory esub is defined, mesub runs the mandatory esub for all jobs that are submitted to LSF in addition to any esub executable files specified with the -a option.

The naming convention is esub.application. LSF always runs the executable file that is named "esub" (without .application) if it exists in LSF_SERVERDIR.

Note: All esub executable files must be stored in the LSF_SERVERDIR directory that is defined in lsf.conf.
The mesub runs multiple esub executable files in the following order:
  1. Any executable file with the name "esub" in LSF_SERVERDIR
  2. The mandatory esub or esubs specified by LSB_ESUB_METHOD in lsf.conf
  3. One or more esubs in the order that is specified by bsub -a

Example of multiple esub execution

An esub runs only once, even if it is specified by both the bsub -a option and the parameter LSB_ESUB_METHOD.

External execution (eexec)

An eexec is an executable file that you write to control the job environment on the execution host.

Use of eexec not enabled

With eexec enabled

The following are some of the things that you can use an eexec to do:
  • Monitor job state or resource usage
  • Receive data from stdout of esub
  • Run a shell script to create and populate environment variables that are needed by jobs
  • Monitor the number of tasks that are running on a host and raise a flag when this number exceeds a pre-determined limit
  • Pass DCE credentials and AFS tokens by using a combination of esub and eexec executable files; LSF functions as a pipe for passing data from the stdout of esub to the stdin of eexec

For example, if you have a mixed UNIX and Windows cluster, the submission and execution hosts might use different operating systems. In this case, the submission host environment might not meet the job requirements when the job runs on the execution host. You can use an eexec to set the correct user environment between the two operating systems.

Typically, an eexec executable file is a shell script that creates and populates the environment variables that are required by the job. An eexec can also monitor job execution and enforce site-specific resource usage policies.

If an eexec executable file exists in the directory that is specified by LSF_SERVERDIR, LSF starts that eexec for all jobs that are submitted to the cluster. By default, LSF runs eexec on the execution host before the job starts. The job process that starts eexec waits for eexec to finish before the job continues with job execution.

Unlike a pre-execution command that is defined at the job, queue, or application levels, an eexec:
  • Runs at job start, finish, or checkpoint
  • Allows the job to run without pending if eexec fails; eexec has no effect on the job state
  • Runs for all jobs, regardless of queue or application profile

Scope

Applicability Details
Operating system
  • UNIX and Linux
  • Windows
Security
  • Data passing between esub on the submission host and eexec on the execution host is not encrypted.
Job types
  • Batch jobs that are submitted with the bsub command or modified by the bmod command.
  • Batch jobs that are restarted with the brestart command.
  • Interactive tasks that are executed remotely by the following commands:
    • lsrun
    • lsgrun
    • lsmake
Dependencies
  • UNIX and Windows user accounts must be valid on all hosts in the cluster, or the correct type of account mapping must be enabled.
    • For a mixed UNIX and Windows cluster, UNIX and Windows user account mapping must be enabled.
    • For a cluster with a non-uniform user name space, between-host account mapping must be enabled.
    • For a mulicluster environment with a non-uniform user name space, cross-cluster user account mapping must be enabled.
  • User accounts must have the correct permissions to successfully run jobs.
  • An eexec that requires root privileges to run on UNIX, must be configured to run as the root user.
Limitations
  • Only an esub started by bsub can change the job environment on the submission host. An esub started by bmod or brestart cannot change the environment.
  • Any esub messages that are provided to the user must be directed to standard error, not to standard output. Standard output from any esub is automatically passed to eexec.
  • An eexec can handle only one standard output stream from an esub as standard input to eexec. You must make sure that your eexec handles standard output from correctly if any esub writes to standard output.
  • The esub and eexec combination cannot handle daemon authentication. To configure daemon authentication, you must enable external authentication, which uses the eauth executable file.