Monitoring GPU jobs
For jobs submitted with the default GPU requirements (with the option -gpu -), use the bjobs -l command to see the default job-level resource requirement without details like <num=1...>: Requested GPU.
About this task
If the -gpu option specifies GPU requirements (for example, -gpu num=3, the bjobs -l shows the details as Requested GPU <num=3>.
The bjobs -l command displays an output section for GPU jobs that shows the
combined and effective GPU requirements that are specified for the job. The GPU requirement string
is in the format of the GPU_REQ parameter in the application profile or
queue.
- The combined GPU requirement is merged based on the current GPU requirements in job, queue, application, or cluster levels for a job.
- The effective GPU requirement is the one used by a started job. It never changes after a job is started.
bjobs -l
Job <101>, User <user1>, Project <default>, Status <RUN>, Queue <normal>, Co
mmand <sleep 10000>, Share group charged </user1>
Wed Jul 12 04:51:00: Submitted from host <hosta>, CWD </home/user1/,
Specified Hosts <hosta>, Requested GPU;
...
...
EXTERNAL MESSAGES:
MSG_ID FROM POST_TIME MESSAGE ATTACHMENT
0 user1 Jul 12 04:51 hosta:gpus=2,0,1; N
RESOURCE REQUIREMENT DETAILS:
Combined: select[type == any] order[r15s:pg] rusage[ngpus_physical=2.00]
Effective: select[type == any] order[r15s:pg] rusage[ngpus_physical=3.00]
GPU REQUIREMENT DETAILS:
Combined: num=2:mode=shared:mps=no:j_exclusive=no
Effective: num=3:mode=shared:mps=no:j_exclusive=no
Use the bhist -l
command to see the effective GPU requirements string for a GPU allocation, for
example,
bhist -l
Job <204>, User <user1>, Project <default>, Command <blaunch sleep 60>
Wed Jul 12 22:40:54: Submitted from host <hosta>, to Queue <normal>, CWD </
scratch/user1>, 8 Task(s),Requested Resources <span[ptile=4]
.....................rusage[ngpus_physical=4]>,Specified Hosts <haswell05>,
<hosta!>, Requested GPU <num=4:mode=shared:j_exclusive=yes>;
Wed Jul 12 22:40:55: Dispatched 8 Task(s) on Host(s) <hosta> <hosta> <h
hosta> <hosta> <hostb> <hostb> <hostb>
<hostb>, Allocated 8 Slot(s) on Host(s) <hosta> <h
hosta> <hosta> <hosta> <hostb> <hostb>
<hostb> <hostb>, Effective RES_REQ <select[type ==
any] order[r15s:pg] rusage[ngpus_physical=4.00] span[ptil
e=4] >;
Wed Jul 12 22:40:56: Starting (Pid 116194);
Wed Jul 12 22:40:56: External Message "hostb:gpus=0,3,1,2;haswell03:gpus=0,1,2,3;
EFFECTIVE GPU REQ: num=4:mode=shared:mps=no:j_exclusive=yes;"
was posted from "user1" to message box 0;