KILL_JOBS_OVER_RUNLIMIT

Syntax

KILL_JOBS_OVER_RUNLIMIT=interval[:wait_time]

Description

Enables the mbatchd daemon to kill jobs that are running over the defined RUNLIMIT value for a long period of time.

  • interval specifies the checking interval, which is how often the mbatchd daemon checks if there are any jobs that run over the defined RUNLIMIT value, in minutes.
  • wait_time is optional, and specifies how long the mbatchd daemon waits, in minutes, after the jobs reach the defined RUNLIMIT before killing these jobs. The default value is 10 minutes.

During each checking interval, mbatchd checks if any job runs over the defined RUNLIMIT by a value greater than the wait time. If there are jobs over this value, mbatchd directly kills the job. For any jobs that are killed because of this parameter setting, LSF logs an additional kill reason message, which is shown in the bjobs -l or bhist -l output.

Normally, when a job reaches the defined RUNLIMIT, the sbatchd daemon kills the job. However, under some conditions, the job still shows RUN status in mbatchd. This parameter allows mbatchd to clean up or remove jobs that reached the RUNLIMIT value for a long period of time.

Valid values

For interval, specify an integer that is greater than 30 minutes. For wait_time, specify an integer.

Default

Not defined. If the KILL_JOBS_OVER_RUNLIMIT parameter is defined but the wait_time value is not defined, the default value for wait_time is 10 minutes.