About job submission and execution controls
The job submission and execution controls feature uses the executable files esub and eexec to control job options and the job execution environment.
In addition, the epsub executable files can communicate with external components using job submission information such as job ID and queue name and perform additional logic after job submission.
External submission (esub)
- Validate job options
- Change the job options that are specified by a user
- Change user environment variables on the submission host (at job submission only)
- Reject jobs (at job submission only)
- Pass data to stdin of eexec
- Automate job resource requirements
- Enable data provenance to trace job files
When a user submits a job by using bsub or modifies a job by using bmod, LSF runs the esub executable files on the submission host before the job is accepted. If the user submitted the job with options such as -R to specify required resources or -q to specify a queue, an esub can change the values of those options to conform to resource usage policies at your site.
An esub can also change the user environment on the submission host before job submission so that when LSF copies the submission host environment to the execution host, the job runs on the execution host with the values specified by the esub. For example, an esub can add user environment variables to those environment variables already associated with the job.
LSF runs the default executable file named "esub" if it exists in the LSF_SERVERDIR directory, followed by any mandatory esub executable files that are defined by LSB_ESUB_METHOD, followed by any application-specific esub executable files (with .application_name in the file name).
External post-submission (epsub)
An epsub is an executable file that you write to meet the post-submission job requirements at your site with information that is not available before job submission. The following are some of the things that you can use an epsub to do with the newly-available job information:
- Pass job information to an external entity
- Post job information to a local log file
- Perform general logic after a job is submitted to LSF
When a user submits a job by using bsub, modifies a job by using bmod, or restarts a job by using brestart, LSF runs the epsub executable files on the submission host immediately after the job is accepted, and the job may or may not have started running while epsub is running.
When submitting interactive jobs, bsub or bmod runs epsub, then resumes regular interactive job behavior (that is, bsub or bmod runs epsub, then runs the interactive job).
epsub does not pass information to eexec, nor does it get information from eexec. epsub can only read information from the temporary file that contains job submission options (as indicated by the LSB_SUB_PARM_FILE environment variable) and from the environment variables. The information that is available to the epsub after job submission includes the following:
- A temporary file that contains job submission options, which is available through the LSB_SUB_PARM_FILE environment variable. The file that this environment variable specifies is a different file from the one that is initially created by esub before the job submission.
- The LSF job ID, which is available through the LSB_SUB_JOB_ID environment variable. For job arrays, the job ID includes the job array index.
- The name of the final queue to which the job is submitted (including any queue modifications made by esub), which is available through the LSB_SUB_JOB_QUEUE environment variable.
- The LSF job error number if the job submission failed, which is available through the LSB_SUB_JOB_ERR environment variable.
Since epsub is run after job submission, the epsub executable files cannot modify job submission parameters or job environment variables. That is, LSB_SUB_MODIFY_FILE and LSB_SUB_MODIFY_ENVFILE are not available to epsub.
If the esub rejects a job, the corresponding epsub file does not run.
After job submission, bsub or bmod waits for the epsub scripts to finish before returning. If the bsub or bmod return time is crucial, do not use epsub to perform time-consuming activities. In addition, if epsub hangs, bsub or bmod waits indefinitely for the epsub script to finish. This is similar to the esub behavior, because bsub or bmod hangs if an esub script hangs.
LSF runs the default executable file named "epsub" if it exists in the LSF_SERVERDIR directory, followed by any mandatory epsub executable files that are defined by LSB_ESUB_METHOD, followed by any application-specific epsub executable files (with .application_name in the file name).
If a mandatory program specified using the LSB_ESUB_METHOD parameter does not have a corresponding esub executable file (esub.application_name), but has a corresponding epsub executable file (epsub.application_name), the job is submitted normally using the normal external job submission and post-submission mechanisms.
Except for these differences, epsub uses the same framework as esub.
Use of esub or epsub not enabled
With esub or epsub enabled
An esub executable file is typically used to enforce site-specific job submission policies and command line syntax by validating or pre-parsing the command line. The file indicated by the environment variable LSB_SUB_PARM_FILE stores the values that are submitted by the user. An esub reads the LSB_SUB_PARM_FILE and then accepts or changes the option values or rejects the job. Because an esub runs before job submission, using an esub to reject incorrect job submissions improves overall system performance by reducing the load on the management batch daemon (mbatchd).
- Reject any job that requests more than a specified number of CPUs
- Change the submission queue for specific user accounts to a higher priority queue
- Check whether the job specifies an application and, if so, submit the job to the correct application profile
Multiple esub executable files
LSF provides a parent external submission executable file (LSF_SERVERDIR/mesub) that supports the use of application-specific esub executable files. Users can specify one or more esub executable files by using the -a option of bsub or bmod. When a user submits or modifies a job or when a user restarts a job that was submitted or modified with the -a option included, mesub runs the specified esub executable files.
An LSF administrator can specify one or more mandatory esub executable files by defining the parameter LSB_ESUB_METHOD in lsf.conf. If a mandatory esub is defined, mesub runs the mandatory esub for all jobs that are submitted to LSF in addition to any esub executable files specified with the -a option.
The naming convention is esub.application. LSF always runs the executable file that is named "esub" (without .application) if it exists in LSF_SERVERDIR.
- Any executable file with the name "esub" in LSF_SERVERDIR
- The mandatory esub or esubs specified by LSB_ESUB_METHOD in lsf.conf
- One or more esubs in the order that is specified by bsub -a
Example of multiple esub execution
An esub runs only once, even if it is specified by both the bsub -a option and the parameter LSB_ESUB_METHOD.
External execution (eexec)
An eexec is an executable file that you write to control the job environment on the execution host.
Use of eexec not enabled
With eexec enabled
- Monitor job state or resource usage
- Receive data from stdout of esub
- Run a shell script to create and populate environment variables that are needed by jobs
- Monitor the number of tasks that are running on a host and raise a flag when this number exceeds a pre-determined limit
- Pass DCE credentials and AFS tokens by using a combination of esub and eexec executable files; LSF functions as a pipe for passing data from the stdout of esub to the stdin of eexec
For example, if you have a mixed UNIX and Windows cluster, the submission and execution hosts might use different operating systems. In this case, the submission host environment might not meet the job requirements when the job runs on the execution host. You can use an eexec to set the correct user environment between the two operating systems.
Typically, an eexec executable file is a shell script that creates and populates the environment variables that are required by the job. An eexec can also monitor job execution and enforce site-specific resource usage policies.
If an eexec executable file exists in the directory that is specified by LSF_SERVERDIR, LSF starts that eexec for all jobs that are submitted to the cluster. By default, LSF runs eexec on the execution host before the job starts. The job process that starts eexec waits for eexec to finish before the job continues with job execution.
- Runs at job start, finish, or checkpoint
- Allows the job to run without pending if eexec fails; eexec has no effect on the job state
- Runs for all jobs, regardless of queue or application profile
Scope
Applicability | Details |
---|---|
Operating system |
|
Security |
|
Job types |
|
Dependencies |
|
Limitations |
|