bacct

Displays accounting statistics about finished jobs.

Synopsis

bacct [-b | -l[-aff] [-gpu]] [-d] [-E] [-e] [-UF] [-w] [-x] [-cname] [-app application_profile_name] [-C time0,time1] [-D time0,time1] [-f logfile_name | -f - ] [-Lp ls_project_name ...] [-m host_name ...|-M host_list_file] [-N host_name | -N host_model | -N cpu_factor] [-P project_name ...] [-q queue_name ...] [-rcacct "all | rc_account_name ..."] [-rcalloc "all | rc_account_name ..."] [-sla service_class_name ...] [-S time0,time1] [-u user_name ... | -u all] [-f logfile_name] [job_ID ...] [-U resrvation_ID ... | -U all]
bacct [-h | -V]

Description

Displays a summary of accounting statistics for all finished jobs (with a DONE or EXIT status) submitted by the user who ran the command, on all hosts, projects, and queues in the LSF system.

By default, the bacct command displays statistics for all jobs that are logged in the current LSF accounting log file: LSB_SHAREDIR/cluster_name/logdir/lsb.acct.

CPU time is not normalized.

All times are in seconds.

Statistics not reported by the bacct command but of interest to individual system administrators can be generated by directly using awk or perl to process the lsb.acct file.

Throughput calculations

To calculate the throughput (T) of the LSF system, specific hosts, or queues, use the following formula:

T = N/(ET-BT)

Where:

  • N is the total number of jobs for which accounting statistics are reported
  • BT is the job start time, when the first job was logged
  • ET is the job end time, when the last job was logged

Use the option -C time0,time1 to specify the job start time as time0 and the end time as time1. In this way, you can examine throughput during a specific time period.

  • To calculate the total throughput of the LSF system, specify the -u all option without any of the -m, -q, -S, -D, or job_ID options.
  • To calculate the throughput of hosts, specify the -u all option without the -q, -S, -D, or job_ID options.
  • To calculate the throughput of queues, specify the -u all option without the -m, -S, -D, or job_ID options.

Only the jobs that are involved in the throughput calculation are logged (that is, with a DONE or EXIT status). Jobs that are running, suspended, or that were never dispatched after submission are not considered, because they are still in the LSF system and not logged in the lsb.acct file.

The bacct command does not show local pending batch jobs that were killed with the bkill -b command. The bacct command shows LSF multicluster capability jobs and local running jobs even if they are killed by using the bkill -b command.

Options

-aff
Displays information about jobs with CPU and memory affinity resource requirement for each task in the job. A table headed AFFINITY shows detailed memory and CPU binding information for each task in the job, one line for each allocated processor unit.

Use only with the -l option.

-b
Brief format.
-E
Displays accounting statistics that are calculated with eligible pending time instead of total pending time for the wait time, turnaround time, expansion factor (turnaround time divided by run time), and hog factor (CPU time divided by turnaround time).
-d
Displays accounting statistics for successfully completed jobs (with a DONE status).
-e
Displays accounting statistics for exited jobs (with an EXIT status).
-gpu

bacct -l -gpu shows the following information on GPU job allocation after the job finishes:

Use this option only with the -l option.

Host Name
Name of the host.
GPU IDs on the host
Each GPU is shown as a separate line.
TASK and ID
List of job tasks and IDs using the GPU (separated by comma if used by multiple tasks)
MODEL
Contains the GPU brand name and model type name.
MTOTAL
The total GPU memory size.
GPU Compute Capability
MRSV
GPU memory reserved by the job
SOCKET
socket ID of the GPU located at
NVLINK
Indicates if the GPU has NVLink connections with other GPUs allocated for the job (ranked by GPU ID and including itself). The connection flag of each GPU is a character separated by “/” with the next GPU:
A “Y” indicates there is a direct NVLINK connection between two GPUs.
An “N” shows there is no direct NVLINK connection with that GPU.
A “-” shows the GPU is itself.

If the job exited abnormally due to a GPU-related error or warning, the error or warning message displays. If LSF could not get GPU usage information from DCGM, a hyphen (-) displays.

-l
Long format. Displays detailed information for each job in a multiline format.

If the job was submitted with the bsub -K command, the -l option displays Synchronous Execution.

-UF
Displays unformatted job detail information.

This option makes it easy to write scripts for parsing keywords on bacct. The results of this option have no wide control for the output. Each line starts from the beginning of the line. All lines that start with the time stamp are displayed unformatted in a single line. The output has no line length and format control.

-w
Wide field format.
-x
Displays jobs that triggered a job exception (overrun, underrun, idle, runtime_est_exceeded). Use with the -l option to show the exception status for individual jobs.
-cname
In IBM® Spectrum LSF Advanced Edition, includes the cluster name for execution cluster hosts and host groups in output.
Note: This command option is deprecated and might be removed in a future version of LSF.
-app application_profile_name
Displays accounting information about jobs that are submitted to the specified application profile. You must specify an existing application profile that is configured in lsb.applications.
-C time0,time1
Displays accounting statistics for jobs that completed or exited during the specified time interval. Reads the lsb.acct file and all archived log files (lsb.acct.n) unless the -f option is used to specify a log file.

The time format is the same as in the bhist command.

-D time0,time1
Displays accounting statistics for jobs that are dispatched during the specified time interval. Reads the lsb.acct file and all archived log files (lsb.acct.n) unless the -f option is also used to specify a log file.

The time format is the same as in the bhist command.

-f logfile_name | -f -
Searches the specified job log file for accounting statistics, which is useful for offline analysis. Specify either an absolute or relative path.

The specified file path can contain up to 4094 characters for UNIX, or up to 512 characters for Windows.

Specify the -f - option to force the bacct command to use the lsb.acct log file for accounting statistics. If you are using IBM Spectrum LSF Explorer ("LSF Explorer") to load accounting log records, the -f - option (or any -f argument that specifies a log file) forces the bacct command to bypass LSF Explorer. For more details on using LSF Explorer to load accounting log records, refer to the LSF_QUERY_ES_SERVERS and LSF_QUERY_ES_FUNCTIONS parameters in the lsf.conf file.

-Lp ls_project_name ...
Displays accounting statistics for jobs that belong to the specified LSF License Scheduler projects. If a list of projects is specified, project names must be separated by spaces and enclosed in quotation marks (") or (').
-M host_list_file
Displays accounting statistics for jobs that are dispatched to the hosts listed in a file (host_list_file) containing a list of hosts. The host list file has the following format:
  • Multiple lines are supported
  • Each line includes a list of hosts that are separated by spaces
  • The length of each line must be fewer than 512 characters
-m host_name ...
Displays accounting statistics for jobs that are dispatched to the specified hosts.

If a list of hosts is specified, host names must be separated by spaces and enclosed in quotation marks (") or ('), and maximum length cannot exceed 1024 characters.

-N host_name | -N host_model | -N cpu_factor
Normalizes CPU time by the CPU factor of the specified host or host model, or by the specified CPU factor.

If you use the bacct command offline by indicating a job log file, you must specify a CPU factor.

-P project_name ...
Displays accounting statistics for jobs that belong to the specified projects. If a list of projects is specified, project names must be separated by spaces and enclosed in quotation marks (") or ('). You cannot use one double quotation mark (") and one single quotation mark (') to enclose the list.
-q queue_name ...
Displays accounting statistics for jobs that are submitted to the specified queues.

If a list of queues is specified, queue names must be separated by spaces and enclosed in quotation marks (") or (').

-rcacct "all | rc_account_name ..."
Displays accounting statistics for jobs that are associated with the specified LSF resource connector account name.

If a list of account names is specified, account names must be separated by spaces.

-rcalloc "all | rc_account_name ..."
Displays accounting statistics for jobs that are associated with the specified LSF resource connector account name and actually ran on an LSF resource connector host.

If a list of account names is specified, account names must be separated by spaces.

-S time0,time1
Displays accounting statistics for jobs that are submitted during the specified time interval. Reads the lsb.acct file and all archived log files (lsb.acct.n) unless the -f option is also used to specify a log file.

The time format is the same as in the bhist command.

-sla service_class_name
Displays accounting statistics for jobs that ran under the specified service class.

If a default system service class is configured with the ENABLE_DEFAULT_EGO_SLA parameter in the lsb.params file, but not explicitly configured in the lsb.applications file, the bacct -sla service_class_name command displays accounting information for the specified default service class.

-U reservation_id ... | -U all
Displays accounting statistics for the specified advance reservation IDs, or for all reservation IDs if the keyword all is specified.

A list of reservation IDs must be separated by spaces and enclosed in quotation marks (") or (').

The -U option also displays historical information about reservation modifications.

When combined with the -U option, the -u option is interpreted as the user name of the reservation creator. The following command shows all the advance reservations that are created by user user2:
bacct -U all -u user2

Without the -u option, the bacct -U command shows all advance reservation information about jobs that are submitted by the user.

In a LSF multicluster capability environment, advance reservation information is only logged in the execution cluster, so bacct displays advance reservation information for local reservations only. You cannot see information about remote reservations. You cannot specify a remote reservation ID, and the keyword all displays only information about reservations in the local cluster.

-u user_name ...|-u all
Displays accounting statistics for jobs that are submitted by the specified users, or by all users if the keyword all is specified.

If a list of users is specified, user names must be separated by spaces and enclosed in quotation marks (") or ('). You can specify both user names and user IDs in the list of users.

job_ID ...
Displays accounting statistics for jobs with the specified job IDs.

If the reserved job ID 0 is used, it is ignored.

In LSF multicluster capability job forwarding mode, you can use the local job ID and cluster name to retrieve the job details from the remote cluster.

General queries have the following syntax:

bacct submission_job_id@submission_cluster_name
Job arrays have the following query syntax:
bacct "submission_job_id[index]"@submission_cluster_name"

The advantage of using submission_job_id@submission_cluster_name instead of bacct -l job_ID is that you can use submission_job_id@submission_cluster_name as an alias to query a local job in the execution cluster without knowing the local job ID in the execution cluster. The bacct output is identical no matter which job ID you use (local job ID or submission_job_id@submission_cluster_name

You can use the bacct 0 command to find all finished jobs in your local cluster, but bacct 0@submission_cluster_name is not supported.

-h
Prints command usage to stderr and exits.
-V
Prints LSF release version to stderr and exits.

Default output format (SUMMARY)

Statistics on jobs. The following fields are displayed:

  • Total number of done jobs.
  • Total number of exited jobs.
  • Total CPU time consumed.
  • Average CPU time consumed.
  • Maximum CPU time of a job.
  • Minimum CPU time of a job.
  • Memory and CPU usage information, such as CPU efficiency, CPU peak usage, and memory efficiency values.
  • Total wait time in queues.
  • Average wait time in queue, which is the elapsed time from job submission to job dispatch.
  • Maximum wait time in queue.
  • Minimum wait time in queue.
  • Average turnaround time, which is the elapsed time from job submission to job completion (seconds per job).
  • Maximum turnaround time.
  • Minimum turnaround time.
  • Average hog factor of a job, which is the amount of CPU time that is consumed by a job divided by its turnaround time (CPU time divided by turnaround time).
  • Maximum hog factor of a job.
  • Minimum hog factor of a job.
  • Average expansion factor of a job, which is its turnaround time divided by its run time (turnaround time divided by run time).
  • Maximum expansion factor of a job.
  • Minimum expansion factor of a job.
  • Total run time consumed.
  • Average run time consumed.
  • Maximum run time of a job.
  • Minimum run time of a job.
  • Total throughput, which is the number of completed jobs divided by the time period to finish these jobs (jobs/hour).
  • Beginning time, which is the completion or exit time of the first job selected.
  • Ending time, which is the completion or exit time of the last job selected.
  • Scheduler efficiency for a set of finished jobs in the cluster. For each job, its scheduler efficiency is a job's run time divided by the total of its run time and the scheduler overhead time (run time/(run time + scheduler overhead)). The overall scheduler efficiency is the average scheduler efficiency of all jobs. The bacct command displays the scheduler efficiency for slot and memory resources.

The total, average, minimum, and maximum statistics are on all specified jobs.

Output: Brief format (-b)

In addition to the default format SUMMARY, displays the following fields:

U/UID
Name of the user who submitted the job. If LSF fails to get the user name by getpwuid, the user ID is displayed.
QUEUE
Queue to which the job was submitted.
SUBMIT_TIME
Time when the job was submitted.
CPU_T
CPU time that is consumed by the job.
WAIT
Wait time of the job.
TURNAROUND
Turnaround time of the job.
FROM
Host from which the job was submitted.
EXEC_ON
Host or hosts to which the job was dispatched to run.
JOB_NAME
The job name that is assigned by the user, or the command string assigned by default at job submission with the bsub command. If the job name is too long to fit in this field, then only the latter part of the job name is displayed.

The displayed job name or job command can contain up to 4094 characters.

Output: Long format (-l)

Also displays host-based accounting information (CPU_T, MEM, and SWAP) for completed jobs when the LSF_HPC_EXTENSIONS="HOST_RUSAGE" parameter is set in the lsf.conf file.

In addition to the fields displayed by default in SUMMARY and by the -b option, displays the following fields:

JOBID
Identifier that LSF assigned to the job.
PROJECT_NAME
Project name that is assigned to the job.
STATUS
Status that indicates the job was either successfully completed (DONE status) or exited (EXIT status).
DISPATCH_TIME
Time when the job was dispatched to run on the execution hosts.
COMPL_TIME
Time when the job exited or completed.
HOG_FACTOR
Average hog factor, equal to CPU_time / turnaround_time.
MEM
Maximum resident memory usage of all processes in a job. By default, memory usage is shown in MB. Use the LSF_UNIT_FOR_LIMITS parameter in the lsf.conf file to specify a larger unit for display (MB, GB, TB, PB, or EB).
CWD
Full path of the current working directory (CWD) for the job.
Specified CWD
User specified execution CWD.
SWAP
Maximum virtual memory usage of all processes in a job. By default, swap space is shown in MB. Use the LSF_UNIT_FOR_LIMITS parameter in the lsf.conf file to specify a larger unit for display (MB, GB, TB, PB, or EB).
CPU_PEAK
The peak value of CPU numbers used by the job.
CPU EFFICIENCY (starting in Fix Pack 14, called CPU PEAK EFFICIENCY)
The percentage of CPU efficiency (or CPU peak, for Fix Pack 14) is calculated by using the formula:
CPU EFFICIENCY = (CPU_PEAK / CPU_REQUESTED) * 100%
CPU AVERAGE EFFICIENCY
The percentage of CPU average efficiency is calculated by using the formula:
CPU AVERAGE EFFICIENCY = (CPU_TIME / (JOB_RUN_TIME * CPU_REQUESTED)) * 100%
CPU PEAK DURATION
The duration, in seconds, that the CPU usage reaches the CPU peak for the running job.
MEM_EFFICIENCY
The percentage of memory efficiency is calculated by using the following formula:
MEM_EFFICIENCY = (MAX_MEM / (REQUESTED_MEM * MAX_EXECUTION_HOSTS_NUMBER))* 100%
INPUT_FILE
File from which the job reads its standard input (see bsub -i input_file).
OUTPUT_FILE
File to which the job writes its standard output (see bsub -o output_file).
ERR_FILE
File in which the job stores its standard error output (see bsub -e err_file).
EXCEPTION STATUS
The exception status of a job includes the following possible values:
idle
The job is consuming less CPU time than expected. The job idle factor (CPU_time/run_time) is less than the configured JOB_IDLE threshold for the queue and a job exception was triggered.
overrun
The job is running longer than the number of minutes specified by the JOB_OVERRUN threshold for the queue and a job exception was triggered.
underrun
The job finished sooner than the number of minutes specified by the JOB_UNDERRUN threshold for the queue and a job exception was triggered.
runtime_est_exceeded
The job is running longer than the number of minutes specified by the runtime estimation and a job exception was triggered.
SYNCHRONOUS_EXECUTION
Job was submitted with the -K option. LSF submits the job and waits for the job to complete.
JOB_DESCRIPTION
The job description that is assigned by the user at job submission with bsub. This field is omitted if no job description was assigned.

The displayed job description can contain up to 4094 characters.

Dispatched <number> Tasks on Hosts
The number of tasks in the job and the hosts to which those tasks were sent for processing. Displayed when the ifLSB_ENABLE_HPC_ALLOCATION parameter is set to Y or y in the lsf.conf file.
Allocated <number> Slot(s) on Host(s)
The number of slots that were allocated to the job based on the number of tasks, and the hosts on which the slots are allocated. Displayed when the LSB_ENABLE_HPC_ALLOCATION parameter is set to Y or y in the lsf.conf file.
Effective RES_REQ
Displays a job's effective resource requirement as seen by the scheduler after resolving any OR constructs.
PE Network ID
Displays network resource allocations for IBM Parallel Edition (PE) jobs that are submitted with the bsub -network option, or to a queue or an application profile with the NETWORK_REQ parameter defined:
bacct -l 210
Job <210>, User <user1>;, Project <default>, Status <DONE>. Queue <normal>,
                     Command <my_pe_job>
Tue Jul 17 06:10:28: Submitted from host <hostA>, CWD </home/pe_jobs>;
Tue Jul 17 06:10:31: Dispatched to <hostA>, Effective RES_REQ <select[type 
                     == local] order[r15s:pg] rusage[mem=1.00] >, PE Network 
                     ID <1111111>  <2222222> used <1> window(s)
                     per network per task;
Tue Jul 17 06:11:31: Completed <done>.

Output: Advance reservations (-U)

Displays the following fields:

RSVID
Advance reservation ID assigned by brsvadd command.
TYPE
Type of reservation: user or system.
CREATOR
User name of the advance reservation creator, who submitted the brsvadd command.
USER
User name of the advance reservation user, who submitted the job with the bsub -U command.
NCPUS
Number of CPUs reserved.
RSV_HOSTS
List of hosts for which processors are reserved, and the number of processors reserved.
TIME_WINDOW
Time window for the reservation.
  • A one-time reservation displays fields that are separated by slashes (month/day/hour/minute).
    11/12/14/0-11/12/18/0
  • A recurring reservation displays fields that are separated by colons (day:hour:minute).
    5:18:0 5:20:0

Output: Affinity resource requirements information (-l -aff)

Use the -l -aff option to display accounting job information about CPU and memory affinity resource allocations for job tasks. A table with the heading AFFINITY is displayed containing the detailed affinity information for each task, one line for each allocated processor unit. CPU binding and memory binding information are shown in separate columns in the display.

HOST
The host the task is running on.
TYPE
Requested processor unit type for CPU binding. One of numa, socket, core, or thread.
LEVEL
Requested processor unit binding level for CPU binding. One of numa, socket, core, or thread. If no CPU binding level is requested, a dash (-) is displayed.
EXCL
Requested processor unit binding level for exclusive CPU binding. One of numa, socket, core, or thread. If no exclusive binding level is requested, a dash (-) is displayed.
IDS
List of physical or logical IDs of the CPU allocation for the task.

The list consists of a set of paths, represented as a sequence of integers separated by slash characters (/), through the topology tree of the host. Each path identifies a unique processing unit that is allocated to the task. For example, a string of the form 3/0/5/12 represents an allocation to thread 12 in core 5 of socket 0 in NUMA node 3. A string of the form 2/1/4represents an allocation to core 4 of socket 1 in NUMA node 2. The integers correspond to the node ID numbers displayed in the topology tree from bhosts -aff.

POL
Requested memory binding policy. Eitherlocal or pref. If no memory binding is requested, - is displayed.
NUMA
ID of the NUMA node that the task memory is bound to. If no memory binding is requested, a dash (-) is displayed.
SIZE
Amount of memory that is allocated for the task on the NUMA node.
For example, the following job starts 6 tasks with the following affinity resource requirements:
bsub -n 6 -R"span[hosts=1] rusage[mem=100]affinity[core(1,same=socket,
exclusive=(socket,injob)):cpubind=socket:membind=localonly:distribute=pack]" myjob
Job <6> is submitted to default queue <normal>.
bacct -l -aff 6

Accounting information about jobs that are:
  - submitted by all users.
  - accounted on all projects.
  - completed normally or exited
  - executed on all hosts.
  - submitted to all queues.
  - accounted on all service classes.
------------------------------------------------------------------------------

Job <6>, User <user1>, Project <default>, Status <DONE>, Queue <normal>, Comma
                     nd <myjob>
Thu Feb 14 14:13:46: Submitted from host <hostA>, CWD <$HOME>;
Thu Feb 14 14:15:07: Dispatched 6 Task(s) on Host(s) <hostA> <hostA> <hostA>
                     <hostA> <hostA> <hostA>; Allocated <6> Slot(s) on Host(s)
                     <hostA> <hostA> <hostA> <hostA> <hostA> <hostA>;
                     Effective RES_REQ <select[type == local] order[r15s:pg]
                     rusage[mem=100.00] span[hosts=1] affinity
                     [core(1,same=socket,exclusive=(socket,injob))*1:cpubind=
                     socket:membind=localonly:distribute=pack] >
                     ;
Thu Feb 14 14:16:47: Completed <done>.

AFFINITY:
                    CPU BINDING                          MEMORY BINDING
                    ------------------------             --------------------
HOST                TYPE   LEVEL  EXCL   IDS             POL   NUMA SIZE
hostA               core   socket socket /0/0/0          local 0    16.7MB
hostA               core   socket socket /0/1/0          local 0    16.7MB
hostA               core   socket socket /0/2/0          local 0    16.7MB
hostA               core   socket socket /0/3/0          local 0    16.7MB
hostA               core   socket socket /0/4/0          local 0    16.7MB
hostA               core   socket socket /0/5/0          local 0    16.7MB

Accounting information about this job:
     CPU_T     WAIT     TURNAROUND   STATUS     HOG_FACTOR    MEM    SWAP   
      0.01      81       181          done       0.0001        2M     137M
     CPU_PEAK    CPU_PEAK_DURATION    CPU_PEAK_EFFICIENCY
      4.24        54 second(s)         106.02%
     CPU_AVG_EFFICIENCY     MEM_EFFICIENCY
      99.55%                 0.00%
------------------------------------------------------------------------------

SUMMARY:      ( time unit: second )
 Total number of done jobs:       1      Total number of exited jobs:     0
 Total CPU time consumed:       0.0      Average CPU time consumed:     0.0
 Maximum CPU time of a job:     0.0      Minimum CPU time of a job:     0.0
 Total wait time in queues:    81.0
 Average wait time in queue:   81.0
 Maximum wait time in queue:   81.0      Minimum wait time in queue:   81.0
 Average turnaround time:       181 (seconds/job)
 Maximum turnaround time:       181      Minimum turnaround time:       181
 Average hog factor of a job:  0.00 ( cpu time / turnaround time )
 Maximum hog factor of a job:  0.00      Minimum hog factor of a job:  0.00
 Average expansion factor of a job:  1.00 (turnaround time/run time)
 Maximum expansion factor of a job:  1.00   Minimum expansion factor of a job:  1.00
...

Termination reasons displayed by bacct

When LSF detects that a job is terminated, bacct -l displays one of the following termination reasons. The corresponding exit code integer value that is logged to the JOB_FINISH record in the lsb.acct file is given in parentheses.

  • TERM_ADMIN: Job was killed by root or LSF administrator (15)
  • TERM_BUCKET_KILL: Job was killed with the bkill -b command (23)
  • TERM_CHKPNT: Job was killed after checkpointing (13)
  • TERM_CWD_NOTEXIST: current working directory is not accessible or does not exist on the execution host (25)
  • TERM_CPULIMIT: Job was killed after it reached LSF CPU usage limit (12)
  • TERM_DEADLINE: Job was killed after deadline expires (6)
  • TERM_EXTERNAL_SIGNAL: Job was killed by a signal external to LSF (17)
  • TERM_FORCE_ADMIN: Job was killed by root or LSF administrator without time for cleanup (9)
  • TERM_FORCE_OWNER: Job was killed by owner without time for cleanup (8)
  • TERM_LOAD: Job was killed after load exceeds threshold (3)
  • TERM_MEMLIMIT: Job was killed after it reached LSF memory usage limit (16)
  • TERM_ORPHAN_SYSTEM: The orphan job was automatically terminated by LSF (27)
  • TERM_OWNER: Job was killed by owner (14)
  • TERM_PREEMPT: Job was killed after preemption (1)
  • TERM_PROCESSLIMIT: Job was killed after it reached LSF process limit (7)
  • TERM_REMOVE_HUNG_JOB: Job was removed from LSF system after it reached a job runtime limit (26)
  • TERM_REQUEUE_ADMIN: Job was killed and requeued by root or LSF administrator (11)
  • TERM_REQUEUE_OWNER: Job was killed and requeued by owner (10)
  • TERM_RUNLIMIT: Job was killed after it reached LSF runtime limit (5)
  • TERM_SWAP: Job was killed after it reached LSF swap usage limit (20)
  • TERM_THREADLIMIT: Job was killed after it reached LSF thread limit (21)
  • TERM_UNKNOWN: LSF cannot determine a termination reason. 0 is logged but TERM_UNKNOWN is not displayed (0)
  • TERM_WINDOW: Job was killed after queue run window closed (2)
  • TERM_ZOMBIE: Job exited while LSF is not available (19)
Tip: The integer values logged to the JOB_FINISH record in the lsb.acct file and termination reason keywords are mapped in the lsbatch.h header file.

Example: Default format

bacct 
Accounting information about jobs that are: 
  - submitted by users user1. 
  - accounted on all projects.
  - completed normally or exited.
  - executed on all hosts.
  - submitted to all queues.
  - accounted on all service classes.
--------------------------------------------------------------
SUMMARY:      ( time unit: second ) 
Total number of done jobs:     268      Total number of exited jobs:    31 
Total CPU time consumed:     566.4      Average CPU time consumed:     1.9 
Maximum CPU time of a job:   229.9      Minimum CPU time of a job:     0.0 
Total wait time in queues:   393.0 
Average wait time in queue:    1.3 
Maximum wait time in queue:   97.0      Minimum wait time in queue:    0.0 
Average turnaround time:        32 (seconds/job) 
Maximum turnaround time:       301      Minimum turnaround time:         0 
Average hog factor of a job:  0.16 ( cpu time / turnaround time ) 
Maximum hog factor of a job:  0.91      Minimum hog factor of a job:  0.00 
 Average expansion factor of a job:  1.13 (turnaround time/run time)
 Maximum expansion factor of a job:  2.04   Minimum expansion factor of a job:  1.00
Total Run time consumed:      9466      Average Run time consumed:      31 
 Maximum Run time of a job:     300      Minimum Run time of a job:       0 
Total throughput:           122.17 (jobs/hour)  during    2.45 hours 
Beginning time:       Oct 20 13:40      Ending time:          Oct 20 16:07

Example: Jobs with triggered job exceptions

bacct -x -l

Accounting information about jobs that are: 
  - submitted by users user1, 
  - accounted on all projects.
  - completed normally or exited
  - executed on all hosts.
  - submitted to all queues.
  - accounted on all service classes.
---------------------------------------------------------
Job <1743>, User <user1>, Project <default>, Status <DONE>, Queue <normal>,  Command<sleep 30>
Mon Aug 11 18:16:17 2009: Submitted from host <hostB>, CWD <$HOME/jobs>, Output File </dev/null>;
Mon Aug 11 18:17:22 2009: Dispatched to <hostC>; Effective RES_REQ <select[(hname = delgpu3 ) && 
											 (type == any)] order[r15s:pg]>;
Mon Aug 11 18:18:54 2009: Completed <done>.
 EXCEPTION STATUS:  underrun 

Accounting information about this job:
     CPU_T     WAIT     TURNAROUND   STATUS     HOG_FACTOR    MEM    SWAP
      0.19      65       157          done       0.0012        4M     5M
     CPU_PEAK    CPU_PEAK_DURATION    CPU_PEAK_EFFICIENCY
      4.24        54 second(s)         106.02%
     CPU_AVG_EFFICIENCY     MEM_EFFICIENCY
      99.55%                 0.00%
------------------------------------------------------------
Job <1948>, User <user1>, Project <default>, Status <DONE>, Queue <normal>,Command <sleep 550>, 
Job Description <This job is a test job.>
Tue Aug 12 14:15:03 2009: Submitted from host <hostB>, CWD <$HOME/jobs>, Output File </dev/null>;
Tue Aug 12 14:15:15 2009: Dispatched to <hostC>; Effective RES_REQ <select[(hname = delgpu3 ) && 
											 (type == any)] order[r15s:pg]>;
Tue Aug 12 14:25:08 2009: Completed <done>.
 EXCEPTION STATUS:  overrun  idle 

Accounting information about this job:
     CPU_T     WAIT     TURNAROUND   STATUS     HOG_FACTOR    MEM    SWAP
      0.20      12       605          done       0.0003        4M     5M
     CPU_PEAK    CPU_PEAK_DURATION    CPU_PEAK_EFFICIENCY
      4.24        54 second(s)         106.02%
     CPU_AVG_EFFICIENCY     MEM_EFFICIENCY
      99.55%                 0.00%
-------------------------------------------------------------
Job <1949>, User <user1>, Project <default>, Status <DONE>, Queue <normal>,Command <sleep 400>
Tue Aug 12 14:26:11 2009: Submitted from host <hostB>, CWD <$HOME/jobs>, Output File </dev/null>;
Tue Aug 12 14:26:18 2009: Dispatched to <hostC>; Effective RES_REQ <select[(hname = delgpu3 )
											 && (type == any)] order[r15s:pg]>;
Tue Aug 12 14:33:16 2009: Completed <done>.
 EXCEPTION STATUS:  idle 

Accounting information about this job:
     CPU_T     WAIT     TURNAROUND   STATUS     HOG_FACTOR    MEM    SWAP
      0.17       7       425          done       0.0004        4M     5M
     CPU_PEAK    CPU_PEAK_DURATION    CPU_PEAK_EFFICIENCY
      4.24        54 second(s)         106.02%
     CPU_AVG_EFFICIENCY     MEM_EFFICIENCY
      99.55%                 0.00%
Job <719[14]>, Job Name <test[14]>, User <user1>, Project <default>, Status <EXIT>, Queue <normal>, 
Command </home/user1/job1>, Job Description <This job is another test job.>
Mon Aug 18 20:27:44 2009: Submitted from host <hostB>, CWD <$HOME/jobs>, Output File </dev/null>;
Mon Aug 18 20:31:16 2009: [14] dispatched to <hostA>; Effective RES_REQ <select[(hname = delgpu3 )
											  && (type == any)] order[r15s:pg]>;
Mon Aug 18 20:31:18 2009: Completed <exit>.
 EXCEPTION STATUS:  underrun 

Accounting information about this job:
     CPU_T     WAIT     TURNAROUND   STATUS     HOG_FACTOR    MEM    SWAP
      0.19      212      214          exit       0.0009        2M     4M
     CPU_PEAK    CPU_PEAK_DURATION    CPU_PEAK_EFFICIENCY
      4.24        54 second(s)         106.02%
     CPU_AVG_EFFICIENCY     MEM_EFFICIENCY
      99.55%                 0.00%
--------------------------------------------------------------
SUMMARY:      ( time unit: second ) 
 Total number of done jobs:      45      Total number of exited jobs:    56
 Total CPU time consumed:    1009.1      Average CPU time consumed:    10.0
 Maximum CPU time of a job:   991.4      Minimum CPU time of a job:     0.1
 Total wait time in queues: 116864.0
 Average wait time in queue: 1157.1
 Maximum wait time in queue: 7069.0      Minimum wait time in queue:    7.0
 Average turnaround time:      1317 (seconds/job)
 Maximum turnaround time:      7070      Minimum turnaround time:        10
 Average hog factor of a job:  0.01 ( cpu time / turnaround time )
 Maximum hog factor of a job:  0.56      Minimum hog factor of a job:  0.00
 Average expansion factor of a job: 4.6 (turnaround time/run time)
 Maximum expansion factor of a job: 10.2 Minimum expansion factor of a job: 1.00
 Total Run time consumed:     28987      Average Run time consumed:     287 
 Maximum Run time of a job:    6743      Minimum Run time of a job:       2 
 Total throughput:             0.59 (jobs/hour)  during  170.21 hours
 Beginning time:       Aug 11 18:18      Ending time:          Aug 18 20:31

Example: Advance reservation accounting information

bacct -U user1#2
Accounting for:
  - advance reservation IDs: user1#2
  - advance reservations created by user1
-------------------------------------------------------- -----------
RSVID       TYPE      CREATOR    USER    NCPUS       RSV_HOSTS     TIME_WINDOW
user1#2     user      user1      user1   1           hostA:1       9/16/17/36-9/16/17/38
SUMMARY:
Total number of jobs:         4
Total CPU time consumed:      0.5 second
Maximum memory of a job:      4.2 MB
Maximum swap of a job:        5.2 MB
Total duration time:          0 hour    2 minute    0 second

Example: LSF job termination reason logging

When a job finishes, LSF reports the last job termination action that it took against the job and logs it to the lsb.acct file.

If a running job exits because of node failure, LSF sets the correct exit information in the lsb.acct, lsb.events, and the job output file.

Use the bacct -l command to view job exit information that is logged to the lsb.acct file:
bacct -l 7265

Accounting information about jobs that are: 
  - submitted by all users.
  - accounted on all projects.
  - completed normally or exited
  - executed on all hosts.
  - submitted to all queues.
  - accounted on all service classes.
--------------------------------------------------------- ---------
Job <7265>, User <lsfadmin>, Project <default>, Status <EXIT>, Queue <normal>, Command 
<srun sleep 100000>, Job Description <This job is also a test job.>
Thu Sep 16 15:22:09 2009: Submitted from host <hostA>, CWD <$HOME>;
Thu Sep 16 15:22:20 2009: Dispatched to 4 Hosts/Processors <4*hostA>;
Thu Sep 16 15:23:21 2009: Completed <exit>; TERM_RUNLIMIT: job killed after reaching LSF run time limit.

Accounting information about this job:
     Share group charged </lsfadmin>
     CPU_T     WAIT     TURNAROUND   STATUS     HOG_FACTOR    MEM    SWAP
      0.04      11        72          exit       0.0006        0K     0K
     CPU_PEAK    CPU_PEAK_DURATION    CPU_PEAK_EFFICIENCY
      4.24        54 second(s)         106.02%
     CPU_AVG_EFFICIENCY     MEM_EFFICIENCY
      99.55%                 0.00%
--------------------------------------------------------------------
SUMMARY:      ( time unit: second ) 
 Total number of done jobs:       0      Total number of exited jobs:     1
 Total CPU time consumed:       0.0      Average CPU time consumed:     0.0
 Maximum CPU time of a job:     0.0      Minimum CPU time of a job:     0.0
 Total wait time in queues:    11.0
 Average wait time in queue:   11.0
 Maximum wait time in queue:   11.0      Minimum wait time in queue:   11.0
 Average turnaround time:        72     (seconds/job)
 Maximum turnaround time:        72      Minimum turnaround time:        72
 Average hog factor of a job:  0.00     (cpu time / turnaround time )
 Maximum hog factor of a job:  0.00      Minimum hog factor of a job:  0.00

 ...

Example: Resizable job information

Use the bacct -l command to view resizable job information that is logged to the lsb.acct file:
  • The auto-resizable attribute of a job and the resize notification command if the bsub -ar and bsub -rnc resize_notification_command commands are specified.
  • Job allocation changes whenever a JOB_RESIZE event is logged to the lsb.acct file.
When an allocation grows, the bacct command shows:
Additional allocation on <num_hosts> Hosts/Processors <host_list>
When an allocation shrinks, the bacct command shows:
Release allocation on <num_hosts> Hosts/Processors <host_list> by user or
administrator <user_name>
Resize notification accepted;
For the following job is submission:
bsub -n 1, 5 -ar myjob

The initial allocation is on hostA and hostB. The first resize request is allocated on hostC and hostD. A second resize request is allocated on hostE. The bacct -l command has the following output:

bacct -l 205

Accounting information about jobs that are:
  - submitted by all users.
  - accounted on all projects.
  - completed normally or exited
  - executed on all hosts.
  - submitted to all queues.
  - accounted on all service classes.
-------------------------------------------------
Job <1150>, User <user2>, Project <default>, Status <DONE>, Queue <normal>, Command 
<sleep 10>, Job Description <This job is a test job.>
Mon Jun  2 11:42:00 2009: Submitted from host <hostA>, CWD <$HOME>;
Mon Jun  2 11:43:00 2009: Dispatched 6 Task(s) on Host(s) <hostA> <hostB>, 
                          Allocated 6 Slot(s) on Host(s) <hostA> <hostB>,
                          Effective RES_REQ <select[(hname = delgpu3 ) && 
                          (type == any)] order[r15s:pg]>;
Mon Jun  2 11:43:52 2009: Added 2 Task(s) on Host(s) 2 Hosts/Processors 
                          <hostC> <hostD>, 2 additional Slot(s) allocated 
                          on Host(s) <hostC> <hostD>
Mon Jun  2 11:44:55 2009: Additional allocation on <hostC> <hostD>;
Mon Jun  2 11:51:40 2009: Completed <done>.
...

Files

Reads the lsb.acct and lsb.acct.n files.

See also

bhist, bsub, bjobs, lsb.acct, brsvadd, brsvs, bsla, lsb.serviceclasses