Manually configure and use GPU resources (legacy ELIM procedure)

Use this procedure to configure and use GPU resources using the legacy ELIM method (pre-LSF 10.1.0.5).

Procedure

Binary files for base elim.gpu are located under $LSF_SERVERDIR. The binary for optional elim.gpu.ext.c and elim.gpu.topology.c, their makefiles and README files are under LSF_TOP/10.1.0/misc/examples/elim.gpu.ext/. See the README file for steps to build, install, configure, and debug the ELIMs.
Make sure elim executable files are in the LSF_SERVERDIR directory.
For GPU support, make sure the following third-party software is installed correctly:
- CUDA driver
- CUDA toolkit
- Tesla Deployment Kit
- NVIDIA Management Library (NVML)
- CUDA sample is optional.
- CUDA version must be 4.0 or later.
- From CUDA 5.0, the CUDA driver, CUDA toolkit, and CUDA samples are in one package.
- Nodes must have at least one NVIDIA GPU from the Fermi/Kepler family. Earlier Tesla and desktop GPUs of 8800 and later cards are supported. Not all features are available for the earlier cards. Cards earlier than Fermi cards do not support ECC errors, and some do not support Temperature queries.
Optionally, enable integration with NVIDIA Data Center GPU Manager (DCGM).

The NVIDIA Data Center GPU Manager (DCGM) is a suite of data center management tools that allow you to manage and monitor GPU resources in an accelerated data center.

Enable the DCGM integration by defining the LSF_DCGM_PORT parameter in the lsf.conf file.

Configure the LSF cluster that contains the GPU resources:

Configure lsf.shared.

For GPU support, define the following resources in the Resource section, assuming that the maximum number of GPUs per host is three. The first four GPUs are provided by base ELIMs. The others are optional. The ngpus resource is not consumable. Remove changes that are related to the old GPU solution before your define the new one:

Begin Resource
RESOURCENAME        TYPE      INTERVAL  INCREASING  CONSUMABLE  DESCRIPTION
ngpus_prohibited    Numeric   60        N           N           (Number of GPUs in Prohibited Mode)
ngpus               Numeric   60        N           N           (Number of GPUs)
ngpus_shared        Numeric   60        N           Y           (Number of GPUs in Shared Mode)
ngpus_excl_t        Numeric   60        N           Y           (Number of GPUs in Exclusive Thread Mode)
ngpus_excl_p        Numeric   60        N           Y           (Number of GPUs in Exclusive Process Mode)
ngpus_physical      Numeric   60        N           Y           (Number of physical GPUs)
gpu_driver          String    60        ()          ()          (GPU driver version)
gpu_mode0           String    60        ()          ()          (Mode of 1st GPU)
gpu_temp0           Numeric   60        Y           ()          (Temperature of 1st GPU)
gpu_ecc0            Numeric   60        N           ()          (ECC errors on 1st GPU)
gpu_model0          String    60        ()          ()          (Model name of 1st GPU) 
gpu_mode1           String    60        ()          ()          (Mode of 2nd GPU)
gpu_temp1           Numeric   60        Y           ()          (Temperature of 2nd GPU)
gpu_ecc1            Numeric   60        N           ()          (ECC errors on 2nd GPU)
gpu_model1          String    60        ()          ()          (Model name of 2nd GPU)
gpu_mode2           String    60        ()          ()          (Mode of 3rd GPU)
gpu_temp2           Numeric   60        Y           ()          (Temperature of 3rd GPU)
gpu_ecc2            Numeric   60        N           ()          (ECC errors on 3rd GPU)
gpu_model2          String    60        ()          ()          (Model name of 3rd GPU)
gpu_ut0             Numeric   60        Y           ()          (GPU utilization of 1st GPU)
gpu_ut1             Numeric   60        Y           ()          (GPU utilization of 2nd GPU)
gpu_ut2             Numeric   60        Y           ()          (GPU utilization of 3rd GPU)
gpu_shared_avg_ut   Numeric   60        Y           ()          (Average of all shared mode GPUs utilization)
gpu_topology        String    60        ()          ()          (GPU topology on host)
gpu_mut0            Numeric   60        Y           ()          (GPU memory utilization of 1st GPU)
gpu_mut1            Numeric   60        Y           ()          (GPU memory utilization of 2nd GPU)
gpu_mut2            Numeric   60        Y           ()          (GPU memory utilization of 3rd GPU)
gpu_mtotal0         Numeric   60        Y           ()          (Memory total of 1st GPU)
gpu_mtotal1         Numeric   60        Y           ()          (Memory total of 2nd GPU)
gpu_mtotal2         Numeric   60        Y           ()          (Memory total of 3rd GPU)
gpu_mused0          Numeric   60        Y           ()          (Memory used of 1st GPU)
gpu_mused1          Numeric   60        Y           ()          (Memory used of 2nd GPU)
gpu_mused2          Numeric   60        Y           ()          (Memory used of 3rd GPU)
gpu_pstate0         String    60        ()          ()          (Performance state of 1st GPU)
gpu_pstate1         String    60        ()          ()          (Performance state of 2nd GPU)
gpu_pstate2         String    60        ()          ()          (Performance state of 3rd GPU)
gpu_shared_avg_mut  Numeric   60        Y           ()          (Average memory of all shared mode GPUs)
gpu_status0         String    60        ()          ()          (GPU status)
gpu_status1         String    60        ()          ()          (GPU status)
gpu_status2         String    60        ()          ()          (GPU status)
gpu_error0          String    60        ()          ()          (GPU error)
gpu_error1          String    60        ()          ()          (GPU error)
gpu_error2          String    60        ()          ()          (GPU error)
...
End Resource

The gpu_status* and gpu_error* resources are only available if you enabled the DCGM integration by defining the LSF_DCGM_PORT parameter in the lsf.conf file.

Configure the lsf.cluster.cluster_name file.

For GPU support, define the following resources in the ResourceMap section. The first four GPUs are provided by the elims.gpu ELIM. The others are optional. Remove changes that are related to the old GPU solution before you define the new one:

Begin ResourceMap
RESOURCENAME             LOCATION
...

ngpus_prohibited         ([default])
ngpus                    ([default])
ngpus_shared             ([default])
ngpus_excl_t             ([default])
ngpus_excl_p             ([default])
ngpus_physical           ([hostA] [hostB])
gpu_mode0                ([default])
gpu_temp0                ([default])
gpu_ecc0                 ([default])
gpu_mode1                ([default])
gpu_temp1                ([default])
gpu_ecc1                 ([default])
gpu_mode2                ([default])
gpu_temp2                ([default])
gpu_ecc2                 ([default])
gpu_model0               ([default])
gpu_model1               ([default])
gpu_model2               ([default])
gpu_driver               ([default])
gpu_ut0                  ([default])
gpu_ut1                  ([default])
gpu_ut2                  ([default])
gpu_shared_avg_ut        ([default])
gpu_topology             ([default])
gpu_mut0                 ([default])
gpu_mut1                 ([default])
gpu_mut2                 ([default])
gpu_mtotal0              ([default])
gpu_mtotal1              ([default])
gpu_mtotal2              ([default])
gpu_mused0               ([default])
gpu_mused1               ([default])
gpu_mused2               ([default])
gpu_pstate0              ([default])
gpu_pstate1              ([default])
gpu_pstate2              ([default])
gpu_shared_avg_mut       ([default])
gpu_status0              ([default])
gpu_status1              ([default])
gpu_status2              ([default])
gpu_error0               ([default])
gpu_error1               ([default])
gpu_error2               ([default])
...
End ResourceMap

The gpu_status* and gpu_error* resources are only available if you enabled the DCGM integration by defining the LSF_DCGM_PORT parameter in the lsf.conf file.

Optionally, configure lsb.resources.
For the ngpus_shared, gpuexcl_t and gpuexcl_p resources, you can set attributes in the ReservationUsage section with the following values:
```
Begin ReservationUsage 
RESOURCE         METHOD        RESERVE
ngpus_shared     PER_HOST      N
ngpus_excl_t     PER_HOST      N
ngpus_excl_p     PER_HOST      N
nmics            PER_TASK      N
End ReservationUsage
```
If this file has no configuration for GPU resources, by default LSF considers all resources as PER_HOST.
Run the lsadmin reconfig and badmin mbdrestart commands to make configuration changes take effect. If you configure the resource gpu_topology, run the badmin hrestart command too.

Use the lsload -l command to show GPU resources:

$ lsload -I ngpus:ngpus_shared:ngpus_excl_t:ngpus_excl_p
HOST_NAME       status ngpus ngpus_shared ngpus_excl_t ngpus_excl_p
hostA           ok     3.0   12.0         0.0          0.0
hostB           ok     -     -            -            -
hostC           ok     -     -            -            -
hostD           ok     -     -            -            -
hostE           ok     -     -            -            -
hostF           ok     3.0    12.0        0.0          0.0
hostG           ok     3.0    12.0        0.0          1.0
hostH           ok     3.0    12.0        1.0          0.0
hostI           ok     -      -           -            -

Use the bhost -l command to see how the LSF scheduler allocated the GPU resources. These resources are treated as normal host-based resources:

$ bhosts -l hostA
HOST  hostA
STATUS   CPUF  JL/U   MAX  NJOBS  RUN  SSUSP  USUSP  RSV DISPATCH_WINDOW
ok       60.00  -     12   2      2    0      0      0   -
 
CURRENT LOAD USED FOR SCHEDULING:
         r15s  r1m  r15m  ut  pg   io  ls it tmp  swp   mem   slots nmics
Total    0.0   0.0  0.0   0%  0.0  3   4  0  28G  3.9G  22.5G  10   0.0
Reserved 0.0   0.0  0.0   0%  0.0  0   0  0  0M   0M    0M      -    - 
 
          ngpus ngpus_shared ngpus_excl_t ngpus_excl_p
Total     3.0   10.0         0.0          0.0
Reserved  0.0   2.0          0.0          0.0
 
LOAD THRESHOLD USED FOR SCHEDULING:
           r15s  r1m  r15m  ut  pg  io  ls  it  tmp  swp  mem
loadSched   -    -     -    -   -   -   -   -   -    -    -  
loadStop    -    -     -    -   -   -   -   -   -    -    -  
 
            nmics ngpus ngpus_shared ngpus_excl_t ngpus_excl_p 
loadSched   -     -     -            -            -  
loadStop    -     -     -            -            -

Use the lshosts -l command to see the information for GPUs collected by elim:

$ lshosts -l hostA
 
HOST_NAME:  hostA
type    model        cpuf ncpus ndisks maxmem maxswp maxtmp rexpri server nprocs ncores nthreads
X86_64  Intel_EM64T  60.0 12    1      23.9G  3.9G   40317M 0      Yes    2      6      1
 
RESOURCES: (mg)
RUN_WINDOWS:  (always open)
 
LOAD_THRESHOLDS:
r15s  r1m  r15m ut pg io ls it tmp swp mem nmics ngpus ngpus_shared ngpus_excl_t ngpus_excl_p
-     3.5  -    -  -  -  -  -  -   -   -   -     -     -            -            -

You can also use the bpost command to display which GPUs are allocated to the job.

Submit jobs with GPU resources in resource requirements.
Use the select[] string in a resource requirement (-R) to choose the hosts that have GPU resources. Use the rusage[] resource requirement to tell LSF how many GPU resources to use.

Note: If the LSB_GPU_NEW_SYNTAX=Y parameter is specified in the lsf.conf file, you must submit your job with the bsub -gpu option. You cannot use the GPU resources ngpus_shared, ngpus_excl_t and ngpus_excl_p.
Examples:
- Use a GPU in shared mode:
```
bsub -R "select[ngpus>0] rusage [ngpus_shared=2]" gpu_app
```
- Use a GPU in exclusive thread mode for a PMPI job:
```
bsub -n 2 -R 'select[ngpus>0] rusage[ngpus_excl_t=2]' mpirun -lsf gpu_app1
```
- Use a GPU in exclusive process mode for a PMPI job:
```
bsub -n 4 -R "select[ngpus>0] rusage[ngpus_excl_p=2]" mpirun -lsf gpu_app2
```
- Run a job on 1 host with 8 tasks on it, using 2 ngpus_excl_p in total:
```
bsub -n 8 -R "select[ngpus > 0] rusage[ngpus_excl_p=2] span[hosts=1]" mpirun -lsf gpu_app2
```
- Run a job on 8 hosts with 1 task per host, where every task uses 2 gpushared per host:
```
bsub -n 8 -R "select[ngpus > 0] rusage[ngpus_shared=2] span[ptile=1]" mpirun -lsf gpu_app2
```
- Run a job on 4 hosts with 2 tasks per host, where the tasks use a total of 2 ngpus_excl_t per host.
```
bsub -n 8 -R "select[ngpus > 0] rusage[ngpus_excl_t=2] span[ptile=2]" mpirun -lsf gpu_app2
```
Submit jobs with the bsub -gpu

The LSB_GPU_NEW_SYNTAX=Y parameter must specified in the lsf.conf file to submit your job with the bsub -gpu option.