Understand successful application exit values
Jobs that exit with one of the exit codes specified by SUCCESS_EXIT_VALUES in an application profile are marked as DONE. These exit values are not counted in the EXIT_RATE calculation.
0 always indicates application success regardless of SUCCESS_EXIT_VALUES.
If both SUCCESS_EXIT_VALUES and REQUEU_EXIT_VALUES are defined with the same exit code, REQUEU_EXIT_VALUES will take precedence and the job will be set to PEND state and requeued. For example:
bapp -l test
APPLICATION NAME: test
-- Turns on absolute runlimit for this application
STATISTICS:
NJOBS PEND RUN SSUSP USUSP RSV
0 0 0 0 0 0
Both parameters REQUEUE_EXIT_VALUES and SUCCESS_EXIT_VALUE are set to 17.
bsub -app test ./non_zero.sh
Job <5583> is submitted to default queue <normal>
bhist -l 5583
Job <5583>, user <name>, Project <default>, Application <test>, Command <./non_zero.sh>
Fri Feb 1 10:52:20: Submitted from host <HostA>, to Queue <normal>, CWD <$HOME>;
Fri Feb 1 10:52:22: Dispatched to <intel4>, Effective RES_REQ <select[type == local] order[slots] >;
Fri Feb 1 10:52:22: Starting (Pid 31390);
Fri Feb 1 10:52:23: Running with execution home </home/dir>, Execution CWD </home/dir>, Execution Pid <31390>;
Fri Feb 1 10:52:23: Pending: Requeued job is waiting for rescheduling;(exit code 17)
Fri Feb 1 10:52:23: Dispatched to <intel4>, Effective RES_REQ <select[type == local] order[slots] >;
Fri Feb 1 10:52:23: Starting (Pid 31464);
Fri Feb 1 10:52:26: Running with execution home </home/dir>, Execution CWD </home/dir>, Execution Pid <31464>;
Fri Feb 1 10:52:27: Pending: Requeued job is waiting for rescheduling;(exit code 17)
Fri Feb 1 10:52:27: Dispatched to <intel4>, Effective RES_REQ <select[type == local] order[slots] >;
Fri Feb 1 10:52:27: Starting (Pid 31857);
Fri Feb 1 10:52:30: Running with execution home </home/dir>, Execution CWD </home/dir>, Execution Pid <31857>;
Fri Feb 1 10:52:30: Pending: Requeued job is waiting for rescheduling;(exit code 17)
Fri Feb 1 10:52:31: Dispatched to <intel4>, Effective RES_REQ <select[type == local] order[slots] >;
Fri Feb 1 10:52:31: Starting (Pid 32149);
Fri Feb 1 10:52:34: Running with execution home </home/dir>, Execution CWD </home/dir>, Execution Pid <32149>;
Fri Feb 1 10:52:34: Pending: Requeued job is waiting for rescheduling;(exit code 17)
Fri Feb 1 10:52:34: Dispatched to <intel4>, Effective RES_REQ <select[type == local] order[slots] >;
Fri Feb 1 10:52:34: Starting (Pid 32312);
Fri Feb 1 10:52:38: Running with exit code 17
SUCCESS_EXIT_VALUES has no effect on pre-exec and post-exec commands. The value is only used for user jobs.
If the job exit value falls into SUCCESS_EXIT_VALUES, the job will be marked as DONE. Job dependencies on done jobs behave normally.
For parallel jobs, the exit status refers to the job exit status and not the exit status of individual tasks.
Exit codes for jobs terminated by LSF are excluded from success exit value even if they are specified in SUCCESS_EXIT_VALUES.
For example,. if SUCCESS_EXIT_VALUES=2 is defined, jobs exiting with 2 are marked as DONE. However, if LSF cannot find the current working directory, LSF terminates the job with exit code 2, and the job is marked as EXIT. The appropriate termination reason is displayed by bacct.
MultiCluster jobs
In the job forwarding model, for jobs sent to a remote cluster, jobs exiting with success exit codes defined in the remote cluster are considered done successfully.
In the lease model, the parameters of lsb.applications apply to jobs running on remote leased hosts as if they are running on local hosts.