Configuring Milvus to run on GPU

In an existing Milvus service, you can configure index nodes and query nodes to run on GPU hardware.

watsonx.data on IBM Software Hub

Before you begin

About this task

Enabling GPU support in Milvus has significant performance gains and allows you to take advantage of GPU indexes such as GPU_CAGRA, GPU_IVF_FLAT, or GPU_BRUTE_FORCE. GPU_CAGRA is the most recommended one.

Procedure

  1. Obtain the Milvus custom resource (CR) name that you want to enable GPU on:
    oc get wxdengine -n ${PROJECT_CPD_INST_OPERANDS}
    Example output:
    NAME                  VERSION   TYPE     DISPLAY NAME   SIZE        RECONCILE    STATUS    AGE
    lakehouse-milvus496   2.1.0     milvus   milvus         small       Completed    RUNNING   23h
  2. Patch the CR with the desired settings. The following is an example of the configuration:
    oc patch wxdengine/lakehouse-milvus496 \
      --type=merge \
       -n ${PROJECT_CPD_INST_OPERANDS} \
       -p '{ "spec": { 
              "milvus_indexnode": {
                "replicas": <value>,
                "resources": {
                  "requests": {
                    "<setting>": "<value>"
                  },
                  "limits": {
                    "<setting>": "<value>"
                  },
                },
                "tolerations": [
                  "<setting>": "<value>"
                ]
              },
              "milvus_querynode": {
                "replicas": <value>,
                "resources": {
                  "requests": {
                    "<setting>": "<value>"
                  },
                  "limits": {
                    "<setting>": "<value>"
                  },
                },
                "tolerations": [
                  "<setting>": "<value>"
                ]
              }
          } }'
  3. Wait for the CR and Milvus pods to finish updating. The following is an example of confirmation:
    # Milvus should move to "InProgress" status and then back to "Completed"
    oc get wxdengine -n ${PROJECT_CPD_INST_OPERANDS}
    NAME                  VERSION   TYPE     DISPLAY NAME   SIZE        RECONCILE    STATUS    AGE
    lakehouse-milvus496   2.1.0     milvus   milvus         small       Completed    RUNNING   23h
    
    # The index node and query node should be recreated to mount the GPU, and then change to "Running" status
    oc get po -l component=ibm-lh-milvus ${PROJECT_CPD_INST_OPERANDS}
    NAME                                                     READY   STATUS    RESTARTS   AGE
    ibm-lh-lakehouse-milvus496-datacoord-5977cc4888-mxrk5    2/2     Running   0          24h
    ibm-lh-lakehouse-milvus496-datanode-784d87c468-kk99l     2/2     Running   0          24h
    ibm-lh-lakehouse-milvus496-indexnode-9b45df454-hqjnh     2/2     Running   0          5m
    ibm-lh-lakehouse-milvus496-proxy-9bc5b7fcb-ssvph         2/2     Running   0          24h
    ibm-lh-lakehouse-milvus496-querycoord-7949994b74-wh99n   2/2     Running   0          24h
    ibm-lh-lakehouse-milvus496-querynode-778db548b6-v5cp5    2/2     Running   0          5m
    ibm-lh-lakehouse-milvus496-rootcoord-5fd9f7c995-n9ggp    2/2     Running   0          24h
  4. Confirm that the expected number of Milvus pods are running on the GPU by using the nvidia-smi command in the nvidia-daemon-driverset pod. There should be one Milvus process per query node or index node:
    oc get ds -A -o name | grep "nvidia-driver-daemonset"
    daemonset.apps/nvidia-driver-daemonset-415.92.202409241719-0
    
    oc exec -it -n nvidia-gpu-operator daemonset.apps/nvidia-driver-daemonset-415.92.202409241719-0 -- nvidia-smi
    ...
    +-----------------------------------------------------------------------------------------+
    | Processes:                                                                              |
    |  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
    |        ID   ID                                                               Usage      |
    |=========================================================================================|
    |    0    3    0     727884      C   /opt/milvus/bin/milvus                       1148MiB |
    |    1    3    0     727881      C   /opt/milvus/bin/milvus                       1148MiB |
    +-----------------------------------------------------------------------------------------+

Troubleshooting

If the query node or index node status remains Pending for long, you can check the reason by running:
oc describe po <pod-name> -n ${PROJECT_CPD_INST_OPERANDS}
If the reason is insufficiency of resources such as not enough memory or not enough GPUs to satisfy the request, you can:
  • Adjust the patch to lower the number of replicas.
  • Lower the memory.
  • Fix any invalid syntax.
After that, run the patch again. As the controller is waiting on a Pending pod, you can restart the controller to ensure that the new patch is applied to the pods faster by running:
export OPERATOR_POD=$(oc get po -o name -n $PROJECT_CPD_INST_OPERATORS | grep "ibm-lakehouse-controller-manager-") && oc delete $OPERATOR_POD -n $PROJECT_CPD_INST_OPERATORS

Patch examples

The following are some examples to show how you can patch Milvus in different ways. The resource requests, resource limits, replicas, and pod toleration are fully customizable to allow different kinds of hardware and support advanced tuning.
Dedicate an entire GPU for each query and index node
Use the existing replica counts, memory, and CPU settings:
oc patch wxdengine/lakehouse-milvus496 \
  --type=merge \
   -n ${PROJECT_CPD_INST_OPERANDS} \
   -p '{ "spec": { 
          "milvus_indexnode": {
            "resources": {
              "requests": {
                "nvidia.com/gpu": "1"
              },
              "limits": {
                "nvidia.com/gpu": "1"
              }
            }
          },
          "milvus_querynode": {
            "resources": {
              "requests": {
                "nvidia.com/gpu": "1"
              },
              "limits": {
                "nvidia.com/gpu": "1"
              }
            }
          }
      } }'
Dedicate a Multi-Instance GPU (MIG) partition to one query node and one index node
Update memory to match the MIG partition size:
oc patch wxdengine/lakehouse-milvus496 \
  --type=merge \
   -n ${PROJECT_CPD_INST_OPERANDS} \
   -p '{ "spec": { 
          "milvus_indexnode": {
            "replicas": 1,
            "resources": {
              "requests": {
                "nvidia.com/mig-2g.20gb": "1",
                "memory": "19G"
              },
              "limits": {
                "nvidia.com/mig-2g.20gb": "1",
                "memory": "19G"
              }
            }
          },
          "milvus_querynode": {
            "replicas": 1,
            "resources": {
              "requests": {
                "nvidia.com/mig-2g.20gb": "1",
                "memory": "19G"
              },
              "limits": {
                "nvidia.com/mig-2g.20gb": "1",
                "memory": "19G"
              }
            }
          }
      } }'
Add toleration to query and index nodes
Allow the nodes to schedule on tainted Red Hat OpenShift nodes that have NoSchedule and key=nvidia.com/gpu:
oc patch wxdengine/lakehouse-milvus496 \
  --type=merge \
   -n ${PROJECT_CPD_INST_OPERANDS} \
   -p '{ "spec": { 
          "milvus_indexnode": {
            "replicas": 1,
            "resources": {
              "requests": {
                "nvidia.com/gpu": "1"
              },
              "limits": {
                "nvidia.com/gpu": "1"
              }
            },
            "tolerations": [
              {
                "effect": "NoSchedule",
                "key": "nvidia.com/gpu",
                "operator": "Equal",
                "value": "present"
              }
            ]
          },
          "milvus_querynode": {
            "replicas": 1,
            "resources": {
              "requests": {
                "nvidia.com/gpu": "1"
              },
              "limits": {
                "nvidia.com/gpu": "1"
              }
            },
            "tolerations": [
              {
                "effect": "NoSchedule",
                "key": "nvidia.com/gpu",
                "operator": "Equal",
                "value": "present"
              }
            ]
          }
      } }'
Run multiple replicas and customize CPU, memory, and ephemeral storage
oc patch wxdengine/lakehouse-milvus496 \
  --type=merge \
   -n ${PROJECT_CPD_INST_OPERANDS} \
   -p '{ "spec": {
          "milvus_indexnode": {
            "replicas": 4,
            "resources": {
              "requests": {
                "nvidia.com/mig-2g.20gb": "1",
                "memory": "19G",
                "cpu": "16",
                "ephemeral-storage": "10G"
              },
              "limits": {
                "nvidia.com/mig-2g.20gb": "1",
                "memory": "19G",
                "cpu": "16",
                "ephemeral-storage": "10G"
              }
            },
            "tolerations": [
              {
                "effect": "NoSchedule",
                "key": "nvidia.com/gpu",
                "operator": "Equal",
                "value": "present"
              }
            ]
          },
          "milvus_querynode": {
            "replicas": 4,
            "resources": {
              "requests": {
                "nvidia.com/mig-2g.20gb": "1",
                "memory": "19G",
                "cpu": "16",
                "ephemeral-storage": "10G"
              },
              "limits": {
                "nvidia.com/mig-2g.20gb": "1",
                "memory": "19G",
                "cpu": "16",
                "ephemeral-storage": "10G"
              }
            },
            "tolerations": [
              {
                "effect": "NoSchedule",
                "key": "nvidia.com/gpu",
                "operator": "Equal",
                "value": "present"
              }
            ]
          }
      } }'
Reset Milvus to clear all customized settings and GPU requests
oc patch wxdengine/lakehouse-milvus496 \
       --type json \
       -n ${PROJECT_CPD_INST_OPERANDS} \
       -p '[ 
         { 
           "op": "remove", 
           "path": "/spec/milvus_indexnode"
         },
         { 
           "op": "remove", 
           "path": "/spec/milvus_querynode"
         }
]'