Enabling GPU time slicing

This section outlines the steps for ingesting content using a Graphics Processing Unit (GPU).

About this task

You can configure the document ingestion service to utilize an available GPU in your cluster. If all GPUs are currently occupied, you can activate GPU time-slicing to allow the service to share a GPU with other workloads, running in an interleaved manner. For more information on configuring GPU time-slicing, see Time-Slicing GPUs in Kubernetes.
Note: This procedure applies only to NVIDIA GPUs. It is not supported for Spyre GPUs.

Procedure

  1. (Only applicable when using GPU time-slicing) Add the following YAML content to the time-slicing-config-all.yaml file:
    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: time-slicing-config-all
      namespace: nvidia-gpu-operator
    data:
      any: |-
        version: v1
        flags:
          migStrategy: none
        sharing:
          timeSlicing:
            resources:
            - name: nvidia.com/gpu
              replicas: 2
    
  2. (Only applicable when using GPU time-slicing) deploy the ConfigMap to your cluster by running the following command:
    oc apply -f time-slicing-config-all.yaml
  3. (Only applicable when using GPU time-slicing) run the following command to apply the GPU time-slicing configuration to your cluster:
    oc patch clusterpolicies.nvidia.com/gpu-cluster-policy \
    -n nvidia-gpu-operator --type merge \
    -p '{"spec": {"devicePlugin": {"config": {"name": "time-slicing-config-all", "default": "any"}}}}'
    
  4. Run the following command to enable GPU usage for the document ingestion service:
    oc -n wxa4z-zad patch zassistantdeploy zassistantdeploy --type='merge' -p='{"spec":{"clientIngestion":{"resources":{"requests":{"nvidia.com/gpu":"1"},"limits":{"nvidia.com/gpu":"1"}}}}}'