LSF_NIOS_JOBSTATUS_INTERVAL

Syntax

LSF_NIOS_JOBSTATUS_INTERVAL=time_minutes

Description

Time interval at which NIOS polls mbatchd to check if a job is still running. Applies to interactive batch jobs and blocking jobs.

Use this parameter if you have scripts that depend on an exit code being returned.

If this parameter is not defined and a network connection is lost, mbatchd cannot communicate with NIOS and the return code of a job is not retrieved.

When LSF_NIOS_JOBSTATUS_INTERVAL is defined, NIOS polls mbatchd on the defined interval to check if a job is still running (or pending). NIOS continues to poll mbatchd until it receives an exit code or mbatchd responds that the job does not exist (if the job has already been cleaned from memory for example).

For interactive jobs NIOS polls mbatchd to retrieve a job's exit status when this parameter is enabled and:
  • the connection between NIOS and the job RES is broken. For example, a network failure between submission host and execution host occurs.
  • job RES runs abnormally. For example, it is out of memory.
  • job is waiting for dispatch.

For blocking jobs, NIOS will always poll mbatchd to retrieve a job's exit status when this parameter is enabled,

If an exit code cannot be retrieved, NIOS generates an error message and the code -11.

Valid values

Any integer greater than zero.

Default

Not defined

Notes

Set this parameter to large intervals such as 15 minutes or more so that performance is not negatively affected if interactive jobs are pending for too long. NIOS always calls mbatchd on the defined interval to confirm that a job is still pending and this may add load to mbatchd.

See also

Environment variable LSF_NIOS_PEND_TIMEOUT