Example GPU job submissions
The following are examples of possible submissions for jobs that use GPU resources.
- The following job requests the default GPU resource requirement
num=1:mode=shared:mps=no:j_exclusive=no. The job requests one GPU in
DEFAULT mode, without starting MPS, and the GPU can be used by other jobs
since j_exclusive is set to
no.
bsub -gpu - ./app
- The following job requires 2 EXCLUSIVE_PROCESS mode GPUs and starts MPS
before the job
runs:
bsub -gpu "num=2:mode=exclusive_process:mps=yes" ./app
- The following job requires 2 EXCLUSIVE_PROCESS mode
GPUs, starts MPS before the job runs, and allows multiple jobs in the host to share the same MPS
daemon if those jobs are submitted by the same user with the same GPU
requirements:
bsub -gpu "num=2:mode=exclusive_process:mps=yes,share" ./app
- The following job requires 2 EXCLUSIVE_PROCESS mode
GPUs and starts multiple MPS daemons (one MPS daemon per
socket):
bsub -gpu "num=2:mode=exclusive_process:mps=per_socket" ./app
- The following job requires 2 EXCLUSIVE_PROCESS mode
GPUs and starts multiple MPS daemons (one MPS daemon per socket), and allows multiple jobs in the
socket to share the same MPS daemon if those jobs are submitted by the same user with the same GPU
requirements:
bsub -gpu "num=2:mode=exclusive_process:mps=per_socket,share" ./app
- The following job requires 2 EXCLUSIVE_PROCESS mode
GPUs and starts multiple MPS daemons (one MPS daemon per
GPU):
bsub -gpu "num=2:mode=exclusive_process:mps=per_gpu" ./app
- The following job requires 2 EXCLUSIVE_PROCESS mode
GPUs and starts multiple MPS daemons (one MPS daemon per GPU), and allows multiple jobs in the GPU
to share the same MPS daemon if those jobs are submitted by the same user with the same GPU
requirements:
bsub -gpu "num=2:mode=exclusive_process:mps=per_gpu,share" ./app
- The following job requires 2 DEFAULT mode GPUs and uses them exclusively.
The two GPUs cannot be used by other jobs even though the mode is
shared:
bsub -gpu "num=2:mode=shared:j_exclusive=yes" ./app
- The following job uses 3 DEFAULT mode GPUs and shares them with other
jobs:
bsub -gpu "num=3:mode=shared:j_exclusive=no" ./app
- The following job requests 2 AMD
GPUs:
bsub -gpu "num=2:gvendor=amd" ./app
- The following job requests 2 Vega GPUs with xGMI
connections:
bsub -gpu "num=2:gmodel=Vega:glink=yes" ./app
- The following job requests 2 Nvidia
GPUs:
bsub -gpu "num=2:gvendor=nvidia" ./app
- The following job requests 2 Tesla C2050/C2070
GPUs:
bsub -gpu "num=2:gmodel=C2050_C2070"
- The following job requests 2 Tesla GPUs of any model with a total memory size of 12 GB on each
GPU:
bsub -gpu "num=2:gmodel=Tesla-12G"
- The following job requests 2 Tesla GPUs of any model with a total memory
size of 12 GB on each GPU, but with relaxed GPU affinity
enforcement:
bsub -gpu "num=2:gmodel=Tesla-12G":aff=no
- The following job requests 2 Tesla GPUs of any model with a total memory size of 12 GB on each
GPU and reserves 8 GB of GPU memory on each
GPU:
bsub -gpu "num=2:gmodel=Tesla-12G:gmem=8G"
- The following job requests 4 Tesla K80 GPUs per host and 2 GPUs on each
socket:
bsub -gpu "num=4:gmodel=K80:gtile=2"
- The following job requests 4 Tesla K80 GPUs per host and the GPUs are spread evenly on each
socket:
bsub -gpu "num=4:gmodel=K80:gtile='!'"
- The following job requests 4 Tesla P100 GPUs per host with NVLink connections and the GPUs are
spread evenly on each
socket:
bsub -gpu "num=4:gmodel=TeslaP100:gtile='!':glink=yes"
- The following job uses 2 Nvidia MIG devices with a GPU instance size of 3
and a compute instance size of
2.
bsub -gpu "num=2:mig=3/2" ./app
- The following job uses 4 EXCLUSIVE_PROCESS GPUs that cannot be used by
other jobs. The j_exclusive option defaults to yes for this
job.
bsub -gpu "num=4:mode=exclusive_process" ./app
- The following job requires two tasks. Each task requires 2
EXCLUSIVE_PROCESS GPUs on two hosts. The GPUs are allocated in the same NUMA
as the allocated
CPU.
bsub -gpu "num=2:mode=exclusive_process" -n2 -R "span[ptile=1] affinity[core(1)]" ./app
- The following job ignores the simple GPU resource requirements that are specified in the
-gpu option because the -R option is specifying the
ngpus_physical GPU
resource:
bsub -gpu "num=2:mode=exclusive_process" -n2 -R "span[ptile=1] rusage[ngpus_physical=2:gmodel=TeslaP100:glink=yes]" ./app
Since you can only request EXCLUSIVE_PROCESS GPUs with the -gpu option, move the rusage[] string contents to the -gpu option arguments. The following corrected job submission requires two tasks, and each task requires 2 EXCLUSIVE_PROCESS Tesla P100 GPUs with NVLink connections on two hosts:bsub -gpu "num=2:mode=exclusive_process:gmodel=TeslaP100:glink=yes" -n2 -R "span[ptile=1]" ./app