Example GPU job submissions

The following example job submissions demonstrate ways to submit jobs that use GPU resources.

  • The following job requests the default GPU resource requirement num=1:mode=shared:mps=no:j_exclusive=no. The job requests one GPU in DEFAULT mode, without starting MPS, and the GPU can be used by other jobs since j_exclusive is set to no.
    bsub -gpu - ./app
  • The following job requires two EXCLUSIVE_PROCESS mode GPUs and starts MPS before the job runs:
    bsub -gpu "num=2:mode=exclusive_process:mps=yes" ./app
  • The following job requires two EXCLUSIVE_PROCESS mode GPUs, starts MPS before the job runs, and allows multiple jobs in the host to share the same MPS daemon if those jobs are submitted by the same user with the same GPU requirements:
    bsub -gpu "num=2:mode=exclusive_process:mps=yes,share" ./app
  • The following job requires two EXCLUSIVE_PROCESS mode GPUs and starts multiple MPS daemons (one MPS daemon per socket):
    bsub -gpu "num=2:mode=exclusive_process:mps=per_socket" ./app
  • The following job requires two EXCLUSIVE_PROCESS mode GPUs and starts multiple MPS daemons (one MPS daemon per socket), and allows multiple jobs in the socket to share the same MPS daemon if those jobs are submitted by the same user with the same GPU requirements:
    bsub -gpu "num=2:mode=exclusive_process:mps=per_socket,share" ./app
  • The following job requires two EXCLUSIVE_PROCESS mode GPUs and starts multiple MPS daemons (one MPS daemon per GPU):
    bsub -gpu "num=2:mode=exclusive_process:mps=per_gpu" ./app
  • The following job requires two EXCLUSIVE_PROCESS mode GPUs and starts multiple MPS daemons (one MPS daemon per GPU), and allows multiple jobs in the GPU to share the same MPS daemon if those jobs are submitted by the same user with the same GPU requirements:
    bsub -gpu "num=2:mode=exclusive_process:mps=per_gpu,share" ./app
  • The following job requires two DEFAULT mode GPUs and uses them exclusively. The two GPUs cannot be used by other jobs even though the mode is shared:
    bsub -gpu "num=2:mode=shared:j_exclusive=yes" ./app
  • The following job uses three DEFAULT mode GPUs and shares them with other jobs:
    bsub -gpu "num=3:mode=shared:j_exclusive=no" ./app
  • The following job requests two AMD GPUs:
    bsub -gpu "num=2:gvendor=amd" ./app
  • The following job requests two Vega GPUs with xGMI connections:
    bsub -gpu "num=2:gmodel=Vega:glink=yes" ./app
  • The following job requests two NVIDIA GPUs:
    bsub -gpu "num=2:gvendor=nvidia" ./app
  • The following job requests two Tesla C2050 or C2070 GPUs:
    bsub -gpu "num=2:gmodel=C2050_C2070"
  • The following job requests two Tesla GPUs of any model with a total memory size of 12 GB on each GPU:
    bsub -gpu "num=2:gmodel=Tesla-12G"
  • The following job requests two Tesla GPUs of any model with a total memory size of 12 GB on each GPU, but with relaxed GPU affinity enforcement:
    bsub -gpu "num=2:gmodel=Tesla-12G":aff=no
  • The following job requests two Tesla GPUs of any model with a total memory size of 12 GB on each GPU and reserves 8 GB of GPU memory on each GPU:
    bsub -gpu "num=2:gmodel=Tesla-12G:gmem=8G"
  • The following job requests four Tesla K80 GPUs per host and 2 GPUs on each socket:
    bsub -gpu "num=4:gmodel=K80:gtile=2"
  • The following job requests four Tesla K80 GPUs per host and the GPUs are spread evenly on each socket:
    bsub -gpu "num=4:gmodel=K80:gtile='!'"
  • The following job requests four Tesla P100 GPUs per host with NVLink connections and the GPUs are spread evenly on each socket:
    bsub -gpu "num=4:gmodel=TeslaP100:gtile='!':glink=yes"
  • The following job uses two NVIDIA MIG devices with a GPU instance size of 3 and a compute instance size of 2.
    bsub -gpu "num=2:mig=3/2" ./app
  • The following job uses four EXCLUSIVE_PROCESS GPUs that cannot be used by other jobs. The j_exclusive option defaults to yes for this job.
    bsub -gpu "num=4:mode=exclusive_process" ./app
  • The following job requires two tasks. Each task requires two EXCLUSIVE_PROCESS GPUs on two hosts. The GPUs are allocated in the same NUMA as the allocated CPU.
    bsub -gpu "num=2:mode=exclusive_process" -n2 -R "span[ptile=1] affinity[core(1)]" ./app
  • The following job ignores the simple GPU resource requirements that are specified in the -gpu option because the -R option is specifying the ngpus_physical GPU resource:
    bsub -gpu "num=2:mode=exclusive_process" -n2 -R "span[ptile=1] rusage[ngpus_physical=2:gmodel=TeslaP100:glink=yes]" ./app
    Since you can only request EXCLUSIVE_PROCESS GPUs with the -gpu option, move the rusage[] string contents to the -gpu option arguments. The following corrected job submission requires two tasks, and each task requires 2 EXCLUSIVE_PROCESS Tesla P100 GPUs with NVLink connections on two hosts:
    bsub -gpu "num=2:mode=exclusive_process:gmodel=TeslaP100:glink=yes" -n2 -R "span[ptile=1]" ./app