LSB_BSUB_ERR_RETRY

Syntax

LSB_BSUB_ERR_RETRY=RETRY_CNT[integer] ERR_TYPE[error1 [error2] [...]]

Description

In some cases, jobs can benefit from being automatically retried in the case of failing for a particular error. When specified, LSB_BSUB_ERR_RETRY automatically retries jobs that exit with a particular reason, up to the number of times specified by RETRY_CNT.

Only the following error types (ERR_TYPE) are supported:

  • BAD_XDR: Error during XDR.
  • MSG_SYS: Failed to send or receive a message.
  • INTERNAL: Internal library error.

The number of retries (RETRY_CNT) can be a minimum of 1 to a maximum of 50.

Considerations when setting this parameter:
  • Users may experience what seems like a lag during job submission while the job is retried automatically in the background.
  • Users may see a job submitted more than once, with no explanation (no error is communicated to the user; the job keeps getting submitted until it succeeds or reaches its maximum retry count). In this case, the job ID also changes each time the error is retried.

Default

Not defined. If retry count is not valid, defaults to 5.