Multi-Instance GPU (MIG) partitions a single NVIDIA GPU into multiple isolated instances.
Each instance operates as an independent GPU, enabling resource sharing across multiple
deployments.
Before you begin
- MIG support requirements
- Only specific NVIDIA GPUs support MIG. For more information on the supported models and their
configurations, see the NVIDIA documentation.
Note: The NeMo Retriever Library NIMs support only a subset of the GPUs
that support MIG.
Before you configure MIG, plan how to partition each GPU by using the supported profiles. Each
profile defines the amount of memory and compute engines that the slice can use.
For example, an A100 GPU with 80 GB of memory can be partitioned in the following way:
- One 40 GB slice with 3 compute engines
- Two 20 GB slices with 2 compute engines each
Each A100 GPU contains seven compute engines. Ensure that the total number of compute engines
across all slices on a GPU do not exceed seven.
Note: For optimal ingestion performance, assign a
7g.80gb MIG profile to the embedqa NIM service to increase GPU compute
capacity.
- Example: Partitioning two A100 - 80 GB GPUs to support NeMo Retriever Library
- The following example shows how to configure MIG on a Red Hat OpenShift Container Platform
cluster with two A100 - 80 GB GPUs.
The A100 supports 40 GB, 20 GB, and 10 GB slices. The reranker
NIM requires at least 22 GB of memory, so you must assign it a 40 GB slice. The
embedqa NIM also receives a 40 GB slice due to its footprint. Assign the remaining
NIMs to 20 GB slices.
Use the following partitioning approach:
About this task
This task configures MIG on NVIDIA GPUs that support the MIG feature. The steps involve enabling
MIG, defining GPU partitions, applying the MIG configuration to your nodes, and assigning deployment
to specific GPU slices.
To configure MIG on NVIDIA GPUs, perform the following steps:
Procedure
-
Apply the following configmap in the
nvidia-gpu-operator namespace:
apiVersion: v1
kind: ConfigMap
metadata:
name: nv-ingest-mig
data:
config.yaml: |
version: v1
mig-configs:
all-disabled:
- devices: all
mig-enabled: false
nv-ingest-mig-config:
- devices: [0, 1]
mig-enabled: true
mig-devices:
"2g.20gb": 2
"3g.40gb": 1
- Temporary stop any pods that use the GPUs. The MIG manager cannot reconfigure GPUs that
are in use.
- Patch the
gpu-cluster-policy and set the MIG strategy value to
"mixed".
kubectl patch clusterpolicies.nvidia.com/gpu-cluster-policy \
--type='json' \
-p='[{"op":"replace", "path":"/spec/mig/strategy", "value":"mixed"}]'
- Label one or more nodes with the GPUs that you want to configure your MIG settings,
referencing the MIG configuration defined in the configmap. Then set the correct value to the
NODE variable:
NODE=ip-10-0-9-79.us-west-2.compute.internal
MIG_CONFIGURATION=nv-ingest-mig-config
oc label node $NODE nvidia.com/mig.config=$MIG_CONFIGURATION --overwrite
- Monitor the pods in the
nvidia-gpu-operator namespace. The configuration
must recycle several pods, including the gpu-feature-discovery pods.
- To verify the MIG configuration, run the following command:
You can see MIG slices with memory usage metrics after NeMo Retriever Library is activated. Initially, the memory
usage appears as 1 MiB until the pods start to use the slices.
+-----------------------------------------------------------------------------------------+
| MIG devices: |
+------------------+----------------------------------+-----------+-----------------------+
| GPU GI CI MIG | Memory-Usage | Vol| Shared |
| ID ID Dev | BAR1-Usage | SM Unc| CE ENC DEC OFA JPG |
| | | ECC| |
|==================+==================================+===========+=======================|
| 0 2 0 0 | 3590MiB / 40192MiB | 42 0 | 3 0 2 0 0 |
| | 2MiB / 32767MiB | | |
+------------------+----------------------------------+-----------+-----------------------+
| 0 3 0 1 | 2959MiB / 19968MiB | 28 0 | 2 0 1 0 0 |
| | 2MiB / 16383MiB | | |
+------------------+----------------------------------+-----------+-----------------------+
| 0 4 0 2 | 2971MiB / 19968MiB | 28 0 | 2 0 1 0 0 |
| | 2MiB / 16383MiB | | |
+------------------+----------------------------------+-----------+-----------------------+
| 1 2 0 0 | 25386MiB / 40192MiB | 42 0 | 3 0 2 0 0 |
| | 2MiB / 32767MiB | | |
+------------------+----------------------------------+-----------+-----------------------+
| 1 3 0 1 | 2955MiB / 19968MiB | 28 0 | 2 0 1 0 0 |
| | 2MiB / 16383MiB | | |
+------------------+----------------------------------+-----------+-----------------------+
| 1 4 0 2 | 2281MiB / 19968MiB | 28 0 | 2 0 1 0 0 |
| | 2MiB / 16383MiB | | |
+------------------+----------------------------------+-----------+-----------------------+
Note: If the MIG configuration fails to apply, retrigger the MIG manager by disabling and
re-enabling it in the cluster policy configuration:
migManager:
config:
default: all-disabled
name: nv-ingest-mig
enabled: true -> false -> true
- Before you scale up your microservices that require GPUs, such as NeMo Retriever Library NIMs, you must add the following
snippets to their deployments:
- Scale up the deployments. The pods are then allocated to the correct GPU
slice.