Monitoring the progress of an SLA (bsla)
The bsla command displays the properties of service classes configured in the lsb.serviceclasses file.
Procedure
Examples
- The guarantee SLA bigMemSLA has 10 slots guaranteed, limited to one
slot per
host.
bsla SERVICE CLASS NAME: bigMemSLA -- ACCESS CONTROL: QUEUES[normal] AUTO ATTACH: Y GOAL: GUARANTEE POOL NAME TYPE GUARANTEED USED bigMemPool slots 10 0 - One velocity goal of service class Tofino is active and on time.
The other configured velocity goal is
inactive.
bsla SERVICE CLASS NAME: Tofino -- day and night velocity PRIORITY: 20 GOAL: VELOCITY 30 ACTIVE WINDOW: (17:30-8:30) STATUS: Inactive SLA THROUGHPUT: 0.00 JOBS/CLEAN_PERIOD GOAL: VELOCITY 10 ACTIVE WINDOW: (9:00-17:00) STATUS: Active:On time SLA THROUGHPUT: 10.00 JOBS/CLEAN_PERIOD NJOBS PEND RUN SSUSP USUSP FINISH 300 280 10 0 0 10 - The deadline goal of service class Sooke is not being met, and the
bsla command displays status
Active:Delayed:
bsla SERVICE CLASS NAME: Sooke -- working hours PRIORITY: 20 GOAL: DEADLINE ACTIVE WINDOW: (8:30-19:00) STATUS: Active:Delayed SLA THROUGHPUT: 0.00 JOBS/CLEAN_PERIOD ESTIMATED FINISH TIME: (Tue Oct 28 06:17) OPTIMUM NUMBER OF RUNNING JOBS: 6 NJOBS PEND RUN SSUSP USUSP FINISH 40 39 1 0 0 0 - The configured velocity goal of the service class Duncan is active
and on time. The configured deadline goal of the service class is
inactive.
bsla Duncan SERVICE CLASS NAME: Duncan -- Daytime/Nighttime SLA PRIORITY: 23 USER_GROUP: user1 user2 GOAL: VELOCITY 8 ACTIVE WINDOW: (9:00-17:30) STATUS: Active:On time SLA THROUGHPUT: 0.00 JOBS/CLEAN_PERIOD GOAL: DEADLINE ACTIVE WINDOW: (17:30-9:00) STATUS: Inactive SLA THROUGHPUT: 0.00 JOBS/CLEAN_PERIOD NJOBS PEND RUN SSUSP USUSP FINISH 0 0 0 0 0 0 - The throughput goal of service class Sidney is always active. The
bsla command displays information about the service class:
- Status as active and on time
- An optimum number of 5 running jobs to meet the goal
- Actual throughput of 10 jobs per hour based on the last CLEAN_PERIOD
bsla Sidney SERVICE CLASS NAME: Sidney -- constant throughput PRIORITY: 20 GOAL: THROUGHPUT 6 ACTIVE WINDOW: Always Open STATUS: Active:On time SLA THROUGHPUT: 10.00 JOBs/CLEAN_PERIOD OPTIMUM NUMBER OF RUNNING JOBS: 5 NJOBS PEND RUN SSUSP USUSP FINISH 110 95 5 0 0 10
Viewing jobs running in an SLA (bjobs)
The bjobs -sla command shows jobs running in a service class.
Procedure
bjobs -sla Sidney
JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME
136 user1 RUN normal hostA hostA sleep 100 Sep 28 13:24
137 user1 RUN normal hostA hostB sleep 100 Sep 28 13:25
For time-based SLAs, use the -sla option with the -g option to display job groups attached to a service class. Once a job group is attached to a time-based service class, all jobs submitted to that group are subject to the SLA.
Track historical behavior of an SLA (bacct)
The bacct command shows historical performance of a service class.
Procedure
bsla
SERVICE CLASS NAME: Sidney
-- throughput 6
PRIORITY: 20
GOAL: THROUGHPUT 6
ACTIVE WINDOW: Always Open
STATUS: Active:On time
SLA THROUGHPUT: 10.00 JOBs/CLEAN_PERIOD
OPTIMUM NUMBER OF RUNNING JOBS: 5
NJOBS PEND RUN SSUSP USUSP FINISH
111 94 5 0 0 12
----------------------------------------------
SERVICE CLASS NAME: Surrey
-- throughput 3
PRIORITY: 15
GOAL: THROUGHPUT 3
ACTIVE WINDOW: Always Open
STATUS: Active:On time
SLA THROUGHPUT: 4.00 JOBs/CLEAN_PERIOD
OPTIMUM NUMBER OF RUNNING JOBS: 4
NJOBS PEND RUN SSUSP USUSP FINISH
104 96 4 0 0 4
These two service classes have the following historical performance. For SLA Sidney, the bacct command shows a total throughput of 8.94 jobs per hour over a period of 20.58 hours:
bacct -sla Sidney
Accounting information about jobs that are:
- submitted by users user1,
- accounted on all projects.
- completed normally or exited
- executed on all hosts.
- submitted to all queues.
- accounted on service classes Sidney,
----------------------------------------------
SUMMARY: ( time unit: second )
Total number of done jobs: 183 Total number of exited jobs: 1
Total CPU time consumed: 40.0 Average CPU time consumed: 0.2
Maximum CPU time of a job: 0.3 Minimum CPU time of a job: 0.1
Total wait time in queues: 1947454.0
Average wait time in queue:10584.0
Maximum wait time in queue:18912.0 Minimum wait time in queue: 7.0
Average turnaround time: 12268 (seconds/job)
Maximum turnaround time: 22079 Minimum turnaround time: 1713
Average hog factor of a job: 0.00 ( cpu time / turnaround time )
Maximum hog factor of a job: 0.00 Minimum hog factor of a job: 0.00
Total throughput: 8.94 (jobs/hour) during 20.58 hours
Beginning time: Oct 11 20:23 Ending time: Oct 12 16:58
For SLA Surrey, the bacct command shows a total throughput of 4.36 jobs per hour over a period of 19.95 hours:
bacct -sla Surrey
Accounting information about jobs that are:
- submitted by users user1,
- accounted on all projects.
- completed normally or exited.
- executed on all hosts.
- submitted to all queues.
- accounted on service classes Surrey,
-----------------------------------------
SUMMARY: ( time unit: second )
Total number of done jobs: 87 Total number of exited jobs: 0
Total CPU time consumed: 18.0 Average CPU time consumed: 0.2
Maximum CPU time of a job: 0.3 Minimum CPU time of a job: 0.1
Total wait time in queues: 2371955.0
Average wait time in queue:27263.8
Maximum wait time in queue:39125.0 Minimum wait time in queue: 7.0
Average turnaround time: 30596 (seconds/job)
Maximum turnaround time: 44778 Minimum turnaround time: 3355
Average hog factor of a job: 0.00 ( cpu time / turnaround time )
Maximum hog factor of a job: 0.00 Minimum hog factor of a job: 0.00
Total throughput: 4.36 (jobs/hour) during 19.95 hours
Beginning time: Oct 11 20:50 Ending time: Oct 12 16:47
Because the run times are not uniform, both service classes actually achieve higher throughput than configured.
View parallel jobs in EGO enabled SLA
The bsla -N command shows job counter information by job slots for a service class
Procedure
user1@system-02-461: bsla -N SLA1
SERVICE CLASS NAME: SLA1
PRIORITY: 10
CONSUMER: sla1
EGO_RES_REQ: any host
MAX_HOST_IDLE_TIME: 120
EXCLUSIVE: N
GOAL: VELOCITY 1
ACTIVE WINDOW: Always Open
STATUS: Active:On time
SLA THROUGHPUT: 0.00 JOBS/CLEAN_PERIOD
NSLOTS PEND RUN SSUSP USUSP
42 28 14 0 0