bchkpnt

Checkpoints one or more checkpointable jobs

Synopsis

bchkpnt [-f] [-k] [-app application_profile_name] [-p minutes | -p 0] job_ID | "job_ID[index_list]" ...
bchkpnt [-f] [-k] [-app application_profile_name] [-p minutes | -p 0] -J job_name |-m host_name | -m host_group |-q queue_name |-u "user_name" | -u all [0]
bchkpnt -h | -V

Description

Checkpoints the most recently submitted running or suspended checkpointable job.

LSF administrators and root can checkpoint jobs that are submitted by other users.

Jobs continue to run after they are checkpointed.

LSF runs the echkpnt executable file that is found in the LSF_SERVERDIR directory to checkpoint the job.

Only running members of a chunk job can be checkpointed. For chunk jobs in WAIT state, the mbatchd daemon rejects the checkpoint request.

Options

0

(Zero). Checkpoints all of the jobs that satisfy other specified criteria.

-f

Forces a job to be checkpointed even if non-checkpointable conditions exist (these conditions are operating system-specific).

-app application_profile_name

Operates only on jobs that are associated with the specified application profile. You must specify an existing application profile. If job_ID or 0 is not specified, only the most recently submitted qualifying job is operated on.

-k

Kills a job after it is successfully checkpointed.

-p minutes | -p 0

Enables periodic checkpointing and specifies the checkpoint period, or modifies the checkpoint period of a checkpointed job. Specify -p 0 (zero) to disable periodic checkpointing.

Checkpointing is a resource-intensive operation. For your job to make progress while still providing fault tolerance, specify a checkpoint period of 30 minutes or longer.

-J job_name

Checkpoints only jobs that have the specified job name.

The job name can be up to 4094 characters long. Job names are not unique.

The wildcard character (*) can be used anywhere within a job name, but it cannot appear within an array index. For example, the pattern job* returns jobA and jobarray[1]. The *AAA*[1] pattern returns the first element in job arrays with names that contain AAA. However, the pattern job1[*] does not return anything since the wildcard is within the array index.

-m host_name | -m host_group

Checkpoints only jobs that are dispatched to the specified hosts.

-q queue_name

Checkpoints only jobs that are dispatched from the specified queue.

-u "user_name" | -u all

Checkpoints only jobs that are submitted by the specified users. The keyword all specifies all users. Ignored if a job ID other than 0 (zero) is specified. To specify a Windows user account, include the domain name in uppercase letters and use a single backslash (DOMAIN_NAME\user_name) in a Windows command line or a double backslash (DOMAIN_NAME\\user_name) in a UNIX command line.

job_ID | "job_ID[index_list]"

Checkpoints only the specified jobs.

-h

Prints command usage to stderr and exits.

-V

Prints LSF release version to stderr and exits.

Examples

bchkpnt 1234

Checkpoints the job with job ID 1234.

bchkpnt -p 120 1234

Enables periodic checkpointing or changes the checkpoint period to 120 minutes (2 hours) for a job with job ID 1234.

bchkpnt -m hostA -k -u all 0

When used by root or the LSF administrator, checkpoints and kills all checkpointable jobs on hostA. This command is useful when a host needs to be shut down or rebooted.

See also

bsub, bmod, brestart, bjobs, bqueues, bhosts(1), lsb.queues, mbatchd