Description
By default, displays information about your own pending, running, and suspended jobs.
bjobs displays output for condensed host groups and compute units. These host groups and compute units are defined by CONDENSE in the HostGroup or ComputeUnit section of lsb.hosts. These groups are displayed as a single entry with the name as defined by GROUP_NAME or NAME in lsb.hosts. The -l and -X options display noncondensed output.
If you defined the parameter LSB_SHORT_HOSTLIST=1 in the lsf.conf file, parallel jobs running in the same condensed host group or compute unit are displayed as an abbreviated list.
For re-sizable jobs, bjobs displays the automatically-resizable attribute and the resize notification command.
To display older historical information, use bhist.
Output: Default display
Pending jobs are displayed in the order in which they are considered for dispatch. Jobs in higher priority queues are displayed before those in lower priority queues. Pending jobs in the same priority queues are displayed in the order in which they were submitted but this order can be changed by using the commands btop or bbot. If more than one job is dispatched to a host, the jobs on that host are listed in the order in which they are considered for scheduling on this host by their queue priorities and dispatch times. Finished jobs are displayed in the order in which they were completed.
A listing of jobs is displayed with the following fields:
- JOBID
- The job ID that LSF assigned to the job.
- USER
- The user who submitted the job.
- STAT
- The current status of the job (see JOB STATUS for details).
- QUEUE
- The name of the job queue to which the job belongs. If the queue to which the job
belongs has been removed from the configuration, the queue name is displayed as
lost_and_found. Use bhist to get the original queue
name. Jobs in the lost_and_found queue remain pending until they are
switched with the bswitch command into another queue.
In a LSF multicluster capability resource leasing environment, jobs scheduled by the consumer cluster display the remote queue name in the format queue_name@cluster_name. By default, this field truncates at 10 characters, so you might not see the cluster name unless you use -w or -l.
- FROM_HOST
- The name of the host from which the job was submitted.
With the LSF multicluster capability, if the host is in a remote cluster, the cluster name and remote job ID are appended to the host name, in the format host_name@cluster_name:job_ID. By default, this field truncates at 11 characters; you might not see the cluster name and job ID unless you use -w or -l.
- EXEC_HOST
- The name of one or more hosts on which the job is executing (this field is empty
if the job has not been dispatched). If the host on which the job is running has been removed from
the configuration, the host name is displayed as lost_and_found. Use
bhist to get the original host name.
If the host is part of a condensed host group or compute unit, the host name is displayed as the name of the condensed group.
If you configure a host to belong to more than one condensed host groups using wildcards, bjobs can display any of the host groups as execution host name.
- JOB_NAME
- The job name assigned by the user, or the command string assigned by default at
job submission with bsub. If the job name is too long to fit in this field, then
only the latter part of the job name is displayed.
The displayed job name or job command can contain up to 4094 characters for UNIX, or up to 255 characters for Windows.
- SUBMIT_TIME
- The submission time of the job.
Output: Long format (-l)
The -l option displays a long format listing with the following additional fields:
- Job
- The job ID that LSF assigned to the job.
- User
- The ID of the user who submitted the job.
- Project
- The project the job was submitted from.
- Application Profile
- The application profile the job was submitted to.
- Command
- The job command.
- CWD
- The current working directory on the submission host.
- Data requirement requested
- Indicates that the job has data requirements.
- Execution CWD
- The actual CWD used when job runs.
- Host file
- The path to a user-specified host file used when submitting or modifying a job.
- Initial checkpoint period
- The initial checkpoint period specified at the job level, by bsub -k, or in an application profile with CHKPNT_INITPERIOD.
- Checkpoint period
- The checkpoint period specified at the job level, by bsub -k, in the queue with CHKPNT, or in an application profile with CHKPNT_PERIOD.
- Checkpoint directory
- The checkpoint directory specified at the job level, by bsub -k, in the queue with CHKPNT, or in an application profile with CHKPNT_DIR.
- Migration threshold
- The migration threshold specified at the job level, by bsub -mig.
- Post-execute Command
- The post-execution command specified at the job-level, by bsub -Ep.
- PENDING REASONS
- The reason the job is in the PEND or PSUSP state. The names of the hosts associated with each reason are displayed when both -p and -l options are specified.
- SUSPENDING REASONS
- The reason the job is in the USUSP or SSUSP state.
- loadSched
- The load scheduling thresholds for the job.
- loadStop
- The load suspending thresholds for the job.
- JOB STATUS
- Possible values for the status of a job include:
- PEND
- The job is pending. That is, it has not yet been started.
- PROV
- The job has been dispatched to a power-saved host that is waking up. Before the job can be sent to the sbatchd, it is in a PROV state.
- PSUSP
- The job has been suspended, either by its owner or the LSF administrator, while pending.
- RUN
- The job is currently running.
- USUSP
- The job has been suspended, either by its owner or the LSF administrator, while running.
- SSUSP
- The job has been suspended by LSF. The following are examples of why LSF suspended the job:
- The load conditions on the execution host or hosts have exceeded a threshold according to the loadStop vector defined for the host or queue.
- The run window of the job's queue is closed. See bqueues(1), bhosts(1), and lsb.queues(5).
- DONE
- The job has terminated with status of 0.
- EXIT
- The job has terminated with a non-zero status – it may have been aborted due to an error
in its execution, or killed by its owner or the LSF administrator.
For example, exit code 131 means that the job exceeded a configured resource usage limit and LSF killed the job.
- UNKWN
- mbatchd has lost contact with the sbatchd on the host on which the job runs.
- WAIT
- For jobs submitted to a chunk job queue, members of a chunk job that are waiting to run.
- ZOMBI
- A job becomes ZOMBI if:
- A non-rerunnable job is killed by bkill while the sbatchd on the execution host is unreachable and the job is shown as UNKWN.
- After the execution host becomes available, LSF tries to kill the ZOMBI job. Upon successful
termination of the ZOMBI job, the job's status is changed to EXIT.
With the LSF multicluster capability, when a job running on a remote execution cluster becomes a ZOMBI job, the execution cluster treats the job the same way as local ZOMBI jobs. In addition, it notifies the submission cluster that the job is in ZOMBI state and the submission cluster requeues the job.
- RUNTIME
- Estimated run time for the job, specified by bsub -We or
bmod -We, -We+, -Wep.The following information is displayed when running bjobs -WL, -WF, or -WP.
- TIME_LEFT
- The estimated run time that the job has remaining. Along with the time if applicable, one of the
following symbols may also display.
- E: The job has an estimated run time that has not been exceeded.
- L: The job has a hard run time limit specified but either has no estimated run time or the estimated run time is more than the hard run time limit.
- X: The job has exceeded its estimated run time and the time displayed is the time remaining until the job reaches its hard run time limit.
- A dash indicates that the job has no estimated run time and no run limit, or that it has exceeded its run time but does not have a hard limit and therefore runs until completion.
If there is less than a minute remaining, 0:0 displays.
- FINISH_TIME
- The estimated finish time of the job. For done/exited jobs, this is the actual finish time. For
running jobs, the finish time is the start time plus the estimated run time (where set and not
exceeded) or the start time plus the hard run limit.
- E: The job has an estimated run time that has not been exceeded.
- L: The job has a hard run time limit specified but either has no estimated run time or the estimated run time is more than the hard run time limit.
- X: The job has exceeded its estimated run time and had no hard run time limit set. The finish time displayed is the estimated run time remaining plus the start time.
- A dash indicates that the pending, suspended, or job with no run limit has no estimated finish time.
- %COMPLETE
- The estimated completion percentage of the job.
- E: The job has an estimated run time that has not been exceeded.
- L: The job has a hard run time limit specified but either has no estimated run time or the estimated run time is more than the hard run time limit.
- X: The job has exceeded its estimated run time and had no hard run time limit set.
- A dash indicates that the jobs is pending, or that it is running or suspended, but has no run time limit specified.
Note: For jobs in the state UNKNOWN, the job run time estimate is based on internal counting by the job's mbatchd. - RESOURCE USAGE
- For the LSF multicluster
capability job
forwarding model, this information is not shown if the LSF multicluster
capability
resource usage updating is disabled. Use LSF_HPC_EXTENSIONS="HOST_RUSAGE" in
lsf.conf to specify host-based resource usage.The values for the current usage of a job include:
- HOST
- For host-based resource usage, specifies the host.
- CPU time
- Cumulative total CPU time in seconds of all processes in a job. For host-based resource usage, the cumulative total CPU time in seconds of all processes in a job running on a host.
- IDLE_FACTOR
- Job idle information (CPU time/runtime) if JOB_IDLE is configured in the queue, and the job has triggered an idle exception.
- MEM
- Total resident memory usage of all processes in a job. For host-based resource usage, the total
resident memory usage of all processes in a job running on a host. The sum of host-based
rusage
may not equal the total jobrusage
, since total jobrusage
is the maximum historical value.Memory usage unit is scaled automatically based on the value. Use LSF_UNIT_FOR_LIMITS in lsf.conf to specify the smallest unit for display (KB, MB, GB, TB, PB, or EB).
- SWAP
- Total virtual memory and swap usage of all processes in a job. For host-based resource usage,
the total virtual memory usage of all processes in a job running on a host. The sum of host-based
usage may not equal the total job usage, since total job usage is the maximum historical
value.
Swap usage unit is scaled automatically based on the value. Use the LSF_UNIT_FOR_LIMITS in the lsf.conf file to specify the smallest unit for display (KB, MB, GB, TB, PB, or EB).
By default, LSF collects both memory and swap usage through PIM:- If the EGO_PIM_SWAP_REPORT=n parameter is set in the lsf.conf file (this is the default), swap usage is virtual memory (VSZ) of the entire job process.
- If the EGO_PIM_SWAP_REPORT=y parameter is set in the lsf.conf file, the resident set size (RSS) is subtracted from the virtual memory usage. RSS is the portion of memory occupied by a process that is held in main memory. Swap usage is collected as the VSZ - RSS.
If memory enforcement through the Linux cgroup memory subsystem is enabled with the LSF_LINUX_CGROUP_ACCT=y parameter in the lsf.conf file, LSF uses the cgroup memory subsystem to collect memory and swap usage of all processes in a job.
- NTHREAD
- Number of currently active threads of a job.
- PGID
- Currently active process group ID in a job. For host-based resource usage, the currently active process group ID in a job running on a host.
- PIDs
- Currently active processes in a job. For host-based resource usage, the currently active processes in a job running on a host.
- RESOURCE LIMITS
- The hard resource usage limits that are imposed on the jobs in the queue (see
getrlimit(2) and lsb.queues(5)). These limits are imposed on
a per-job and a per-process basis. The possible per-job resource usage limits are:
- CPULIMIT
- TASKLIMIT
- MEMLIMIT
- SWAPLIMIT
- PROCESSLIMIT
- THREADLIMIT
- OPENFILELIMIT
- HOSTLIMIT_PER_JOB
The possible UNIX per-process resource usage limits are:- RUNLIMIT
- FILELIMIT
- DATALIMIT
- STACKLIMIT
- CORELIMIT
If a job submitted to the queue has any of these limits specified (see bsub(1)), then the lower of the corresponding job limits and queue limits are used for the job.
If no resource limit is specified, the resource is assumed to be unlimited. User shell limits that are unlimited are not displayed.
- EXCEPTION STATUS
- Possible values for the exception status of a job include:
- idle
- The job is consuming less CPU time than expected. The job idle factor (CPU time/runtime) is less than the configured JOB_IDLE threshold for the queue and a job exception has been triggered.
- overrun
- The job is running longer than the number of minutes specified by the JOB_OVERRUN threshold for the queue and a job exception has been triggered.
- underrun
- The job finished sooner than the number of minutes specified by the JOB_UNDERRUN threshold for the queue and a job exception has been triggered.
- Requested resources
- Shows all the resource requirement strings you specified in the bsub command.
- Execution
rusage
- This is shown if the combined RES_REQ has an
rusage
or||
construct. The chosen alternative will be denoted here. - Synchronous Execution
- Job was submitted with the -K option. LSF submits the job and waits for the job to complete.
- JOB_DESCRIPTION
- The job description assigned by the user. This field is omitted if no job
description has been assigned.
The displayed job description can contain up to 4094 characters.
- MEMORY USAGE
- Displays peak memory usage and average memory usage. For
example:
MEMORY USAGE: MAX MEM:11 Mbytes; AVG MEM:6 Mbytes
Starting in Fix Pack 14, displays peak memory usage, average memory usage, and memory usage efficiency. For example:MEMORY USAGE: MAX MEM:11 Mbytes; AVG MEM:6 Mbytes; MEM Efficiency: 10.00%
where,MEM Efficiency
is calculated using the following formula:
If no memory in requested in the bsub command, theMEM Efficiency = (MAX MEM / MEM requested in bsub -R "rusage[mem=]") * 100%
MEM Efficiency
value will be 0.You can adjust the
rusage
value accordingly, the next time for the same job submission, if consumed memory is larger or smaller than currentrusage
amount. - CPU USAGE
- Available starting in Fix Pack 14: displays the maximum number of
CPUs used while running the job (CPU peak), duration for CPU to peak (in seconds), CPU average
efficiency, and CPU peak efficiency. For example:
CPU USAGE: CPU PEAK: 4.24; CPU PEAK DURATION: 54 second(s) CPU AVERAGEG EFFICIENCY: 99.55%; CPU PEAK EFFICIENCY: 106.02%
CPU PEAK
is the maximum number of CPUs used for running the job.CPU PEAK DURATION
is the duration, in seconds, to reach the CPU peak for the job.CPU AVERAGE EFFICIENCY
is calculated using the following formula:CPU AVERAGE EFFICIENCY = (CPU_TIME / (JOB_RUN_TIME * CPU_REQUESTED)) * 100%
CPU AVERAGE EFFICIENCY
is calculated periodically every time theCPU_PEAK_SAMPLE_DURATION
value (defined in the lsb.params file) is reached during a job's run. TheCPU_TIME
andJOB_RUN_TIME
values are used only since the last calculation; the job'sCPU AVERAGE EFFICIENCY
value is the average of all calculatedCPU AVERAGE EFFICIENCY
values in each cycle.CPU PEAK EFFICIENCY
is calculated using the following formula:CPU PEAK Efficiency = (CPU PEAK / CPU_REQUESTED) * 100%
- RESOURCE REQUIREMENT DETAILS
- Displays the configured level of resource requirement details. The
BJOBS_RES_REQ_DISPLAY parameter in lsb.params controls the
level of detail that this column displays, which can be as follows:
- none - no resource requirements are displayed (this column is not displayed in the -l output).
- brief - displays the combined and effective resource requirements.
- full - displays the job, app, queue, combined and effective resource requirements.
- Requested Network
- Displays network resource information for IBM Parallel Edition (PE) jobs submitted
with the bsub -network option. It does not display network resource information
from the NETWORK_REQ parameter in lsb.queues or
lsb.applications. For example:
bjobs -l Job <2106>, User <user1>;, Project <default>;, Status <RUN>;, Queue <normal>, Command <my_pe_job> Fri Jun 1 20:44:42: Submitted from host <hostA>, CWD <$HOME>, Requested Network <protocol=mpi: mode=US: type=sn_all: instance=1: usage=dedicated>
If mode=IP is specified for the PE job, instance is not displayed.
- DATA REQUIREMENTS
- When you use -data, displays a list of requested files for jobs with data requirements.
Output: Forwarded job information
- CLUSTER
- The name of the cluster to which the job was forwarded.
- FORWARD_TIME
- The time that the job was forwarded.
Output: Job array summary information
Use -A to display summary information about job arrays. The following fields are displayed:
- JOBID
- Job ID of the job array.
- ARRAY_SPEC
- Array specification in the format of name[index]. The array specification may be truncated, use -w option together with -A to show the full array specification.
- OWNER
- Owner of the job array.
- NJOBS
- Number of jobs in the job array.
- PEND
- Number of pending jobs of the job array.
- RUN
- Number of running jobs of the job array.
- DONE
- Number of successfully completed jobs of the job array.
- EXIT
- Number of unsuccessfully completed jobs of the job array.
- SSUSP
- Number of LSF system suspended jobs of the job array.
- USUSP
- Number of user suspended jobs of the job array.
- PSUSP
- Number of held jobs of the job array.
Output: LSF Session Scheduler job summary information
- JOBID
- Job ID of the Session Scheduler job.
- OWNER
- Owner of the Session Scheduler job.
- JOB_NAME
- The job name assigned by the user, or the command string assigned by default at
job submission with bsub. If the job name is too long to fit in this field, then
only the latter part of the job name is displayed.
The displayed job name or job command can contain up to 4094 characters for UNIX, or up to 255 characters for Windows.
- NTASKS
- The total number of tasks for this Session Scheduler job.
- PEND
- Number of pending tasks of the Session Scheduler job.
- RUN
- Number of running tasks of the Session Scheduler job.
- DONE
- Number of successfully completed tasks of the Session Scheduler job.
- EXIT
- Number of unsuccessfully completed tasks of the Session Scheduler job.
Output: Unfinished job summary information
- RUN
- The job is running.
- SSUSP
- The job has been suspended by LSF.
- USUSP
- The job has been suspended, either by its owner or the LSF administrator, while running.
- UNKNOWN
- mbatchd has lost contact with the sbatchd on the host where the job was running.
- PEND
- The job is pending, which may include PSUSP and chunk job WAIT. When -sum is used with -p in the LSF multicluster capability, WAIT jobs are not counted as PEND or FWD_PEND. When -sum is used with -r, WAIT jobs are counted as PEND or FWD_PEND.
- FWD_PEND
- The job is pending and forwarded to a remote cluster. The job has not yet started in the remote cluster.
Output: Affinity resource requirements information (-l -aff)
- HOST
- The host the task is running on
- TYPE
- Requested processor unit type for CPU binding. One of numa, socket, core, or thread.
- LEVEL
- Requested processor unit binding level for CPU binding. One of numa, socket, core, or thread. If no CPU binding level is requested, a dash (-) is displayed.
- EXCL
- Requested processor unit binding level for exclusive CPU binding. One of numa, socket, or core. If no exclusive binding level is requested, a dash (-) is displayed.
- IDS
- List of physical or logical IDs of the CPU allocation for the task.
The list consists of a set of paths, represented as a sequence integers separated by slash characters (/), through the topology tree of the host. Each path identifies a unique processing unit allocated to the task. For example, a string of the form 3/0/5/12 represents an allocation to thread 12 in core 5 of socket 0 in NUMA node 3. A string of the form 2/1/4represents an allocation to core 4 of socket 1 in NUMA node 2. The integers correspond to the node ID numbers displayed in the topology tree from bhosts -aff.
- POL
- Requested memory binding policy. Eitherlocal or pref. If no memory binding is requested, a dash (-) is displayed.
- NUMA
- ID of the NUMA node that the task memory is bound to. If no memory binding is requested, a dash (-) is displayed.
- SIZE
- Amount of memory allocated for the task on the NUMA node.
bsub -n 6 -R"span[hosts=1] rusage[mem=100]affinity[core(1,same=socket,
exclusive=(socket,injob)):cpubind=socket:membind=localonly:distribute=pack]" myjob
Job <6> is submitted to default queue <normal>.
bjobs -l -aff 6
Job <6>, User <user1>, Project <default>, Status <RUN>, Queue <normal>, Comman
d <myjob1>
Thu Feb 14 14:13:46: Submitted from host <hostA>, CWD <$HOME>, 6 Task(s),
Requested Resources <span[hosts=1] rusage[mem=10
0]affinity[core(1,same=socket,exclusive=(socket,injob)):cp
ubind=socket:membind=localonly:distribute=pack]>;
Thu Feb 14 14:15:07: Started 6 Task(s) on Hosts <hostA> <hostA> <hostA> <hostA>
<hostA> <hostA>, Allocated 6 Slot(s) on Hosts <hostA>
<hostA> <hostA> <hostA> <hostA> <hostA>, Execution Home
</home/user1>, Execution CWD </home/user1>;
SCHEDULING PARAMETERS:
r15s r1m r15m ut pg io ls it tmp swp mem
loadSched - - - - - - - - - - -
loadStop - - - - - - - - - - -
RESOURCE REQUIREMENT DETAILS:
Combined: select[type == local] order[r15s:pg] rusage[mem=100.00] span[hosts=1
] affinity[core(1,same=socket,exclusive=(socket,injob))*1:
cpubind=socket:membind=localonly:distribute=pack]
Effective: select[type == local] order[r15s:pg] rusage[mem=100.00] span[hosts=
1] affinity[core(1,same=socket,exclusive=(socket,injob))*1
:cpubind=socket:membind=localonly:distribute=pack]
AFFINITY:
CPU BINDING MEMORY BINDING
------------------------ --------------------
HOST TYPE LEVEL EXCL IDS POL NUMA SIZE
hostA core socket socket /0/0/0 local 0 16.7MB
hostA core socket socket /0/1/0 local 0 16.7MB
hostA core socket socket /0/2/0 local 0 16.7MB
hostA core socket socket /0/3/0 local 0 16.7MB
hostA core socket socket /0/4/0 local 0 16.7MB
hostA core socket socket /0/5/0 local 0 16.7MB
Output: Data requirements information (-l -data)
Use -l -data to display detailed information about jobs with data requirements. The heading DATA REQUIREMENTS is displayed followed by a list of the files requested by the job.
bjobs -l -data 1962
Job <1962>, User <user1>, Project <default>, Status <PEND>, Queue
<normal>,Command <my_data_job>
Fri Sep 20 16:31:17: Submitted from host <hb05b10>, CWD
<$HOME/source/user1/work>, Data requirement requested;
PENDING REASONS:
Job is waiting for its data requirement to be satisfied;
SCHEDULING PARAMETERS:
r15s r1m r15m ut pg io ls it tmp swp mem
loadSched - - - - - - - - - - -
loadStop - - - - - - - - - - -
RESOURCE REQUIREMENT DETAILS:
Combined: select[type == local] order[r15s:pg]
Effective: -
DATA REQUIREMENTS:
FILE: hostA:/home/user1/data2
SIZE: 40 MB
MODIFIED: Thu Aug 14 17:01:57
FILE: hostA:/home/user1/data3
SIZE: 40 MB
MODIFIED: Fri Aug 15 16:32:45
FILE: hostA:/home/user1/data4
SIZE: 500 MB
MODIFIED: Mon Apr 14 17:15:56
See also
bsub, bkill, bhosts, bmgroup, bclusters, bqueues, bhist, bresume, bsla, bstop, lsb.params, lsb.serviceclasses, mbatchd