This section outlines the steps for ingesting content using a Graphics Processing Unit
(GPU).
About this task
You can configure the document ingestion service to utilize an available GPU in your
cluster. If all GPUs are currently occupied, you can activate GPU time-slicing to allow the service
to share a GPU with other workloads, running in an interleaved manner. For more information on
configuring GPU time-slicing, see Time-Slicing GPUs in Kubernetes.Note: This procedure applies
only to NVIDIA GPUs. It is not supported for Spyre GPUs.
Procedure
-
(Only applicable when using GPU time-slicing) Add the following YAML content to the
time-slicing-config-all.yaml file:
apiVersion: v1
kind: ConfigMap
metadata:
name: time-slicing-config-all
namespace: nvidia-gpu-operator
data:
any: |-
version: v1
flags:
migStrategy: none
sharing:
timeSlicing:
resources:
- name: nvidia.com/gpu
replicas: 2
- (Only applicable when using GPU time-slicing) deploy the ConfigMap to your cluster by
running the following command:
oc apply -f time-slicing-config-all.yaml
- (Only applicable when using GPU time-slicing) run the following command to apply the GPU
time-slicing configuration to your cluster:
oc patch clusterpolicies.nvidia.com/gpu-cluster-policy \
-n nvidia-gpu-operator --type merge \
-p '{"spec": {"devicePlugin": {"config": {"name": "time-slicing-config-all", "default": "any"}}}}'
- Run the following command to enable GPU usage for the document ingestion service:
oc -n wxa4z-zad patch zassistantdeploy zassistantdeploy --type='merge' -p='{"spec":{"clientIngestion":{"resources":{"requests":{"nvidia.com/gpu":"1"},"limits":{"nvidia.com/gpu":"1"}}}}}'