Configuring a Multi-Instance GPU (MIG) with CAS

Multi-Instance GPU (MIG) partitions a single NVIDIA GPU into multiple isolated instances. Each instance operates as an independent GPU, enabling resource sharing across multiple deployments.

Before you begin

MIG support requirements: Only specific NVIDIA GPUs support MIG. For more information on the supported models and their configurations, see the NVIDIA documentation.
Note: The NeMo Retriever Library NIMs support only a subset of the GPUs that support MIG.

Before you configure MIG, plan how to partition each GPU by using the supported profiles. Each profile defines the amount of memory and compute engines that the slice can use.

For example, an A100 GPU with 80 GB of memory can be partitioned in the following way:

One 40 GB slice with 3 compute engines
Two 20 GB slices with 2 compute engines each

Each A100 GPU contains seven compute engines. Ensure that the total number of compute engines across all slices on a GPU do not exceed seven.

Note: For optimal ingestion performance, assign a 7g.80gb MIG profile to the embedqa NIM service to increase GPU compute capacity.

Example: Partitioning two A100 - 80 GB GPUs to support NeMo Retriever Library

The following example shows how to configure MIG on a Red Hat OpenShift Container Platform cluster with two A100 - 80 GB GPUs.

The A100 supports 40 GB, 20 GB, and 10 GB slices. The reranker NIM requires at least 22 GB of memory, so you must assign it a 40 GB slice. The embedqa NIM also receives a 40 GB slice due to its footprint. Assign the remaining NIMs to 20 GB slices.

Use the following partitioning approach:

GPUs 0 and 1:
- One 40 GB slice (3g.40gb)
- Two 20 GB slices (2g.20gb)

About this task

This task configures MIG on NVIDIA GPUs that support the MIG feature. The steps involve enabling MIG, defining GPU partitions, applying the MIG configuration to your nodes, and assigning deployment to specific GPU slices.

To configure MIG on NVIDIA GPUs, perform the following steps:

Procedure

Apply the following configmap in the nvidia-gpu-operator namespace:

apiVersion: v1
kind: ConfigMap
metadata:
  name: nv-ingest-mig
data:
  config.yaml: |
    version: v1
    mig-configs:
      all-disabled:
        - devices: all
          mig-enabled: false
      nv-ingest-mig-config:
        - devices: [0, 1]
          mig-enabled: true
          mig-devices:
            "2g.20gb": 2
            "3g.40gb": 1

Temporary stop any pods that use the GPUs. The MIG manager cannot reconfigure GPUs that are in use.

Patch the gpu-cluster-policy and set the MIG strategy value to "mixed".

kubectl patch clusterpolicies.nvidia.com/gpu-cluster-policy \
    --type='json' \
    -p='[{"op":"replace", "path":"/spec/mig/strategy", "value":"mixed"}]'

Label one or more nodes with the GPUs that you want to configure your MIG settings, referencing the MIG configuration defined in the configmap. Then set the correct value to the NODE variable:
```
NODE=ip-10-0-9-79.us-west-2.compute.internal
MIG_CONFIGURATION=nv-ingest-mig-config
oc label node $NODE nvidia.com/mig.config=$MIG_CONFIGURATION --overwrite
```
Monitor the pods in the nvidia-gpu-operator namespace. The configuration must recycle several pods, including the gpu-feature-discovery pods.

To verify the MIG configuration, run the following command:

nvidia-smi

You can see MIG slices with memory usage metrics after NeMo Retriever Library is activated. Initially, the memory usage appears as 1 MiB until the pods start to use the slices.


+-----------------------------------------------------------------------------------------+
| MIG devices:                                                                            |
+------------------+----------------------------------+-----------+-----------------------+
| GPU  GI  CI  MIG |                     Memory-Usage |        Vol|        Shared         |
|      ID  ID  Dev |                       BAR1-Usage | SM     Unc| CE ENC  DEC  OFA  JPG |
|                  |                                  |        ECC|                       |
|==================+==================================+===========+=======================|
|  0    2   0   0  |            3590MiB / 40192MiB    | 42      0 |  3   0    2    0    0 |
|                  |                 2MiB / 32767MiB  |           |                       |
+------------------+----------------------------------+-----------+-----------------------+
|  0    3   0   1  |            2959MiB / 19968MiB    | 28      0 |  2   0    1    0    0 |
|                  |                 2MiB / 16383MiB  |           |                       |
+------------------+----------------------------------+-----------+-----------------------+
|  0    4   0   2  |            2971MiB / 19968MiB    | 28      0 |  2   0    1    0    0 |
|                  |                 2MiB / 16383MiB  |           |                       |
+------------------+----------------------------------+-----------+-----------------------+
|  1    2   0   0  |           25386MiB / 40192MiB    | 42      0 |  3   0    2    0    0 |
|                  |                 2MiB / 32767MiB  |           |                       |
+------------------+----------------------------------+-----------+-----------------------+
|  1    3   0   1  |            2955MiB / 19968MiB    | 28      0 |  2   0    1    0    0 |
|                  |                 2MiB / 16383MiB  |           |                       |
+------------------+----------------------------------+-----------+-----------------------+
|  1    4   0   2  |            2281MiB / 19968MiB    | 28      0 |  2   0    1    0    0 |
|                  |                 2MiB / 16383MiB  |           |                       |
+------------------+----------------------------------+-----------+-----------------------+

Note: If the MIG configuration fails to apply, retrigger the MIG manager by disabling and re-enabling it in the cluster policy configuration:

migManager:
    config:
      default: all-disabled
      name: nv-ingest-mig
    enabled: true -> false -> true

Before you scale up your microservices that require GPUs, such as NeMo Retriever Library NIMs, you must add the following snippets to their deployments:
- For larger NIMs, such as the NVIDIA re-ranker:
```
containers:
        - resources:
            limits:
              nvidia.com/mig-3g.40gb: '1'
```
- For smaller NIMs:
```
containers:
        - resources:
            limits:
              nvidia.com/mig-2g.20gb: '1'
```
Scale up the deployments. The pods are then allocated to the correct GPU slice.