Configuring Docling Multimodal GPUs

This section describes how to configure Docling Multimodal GPUs.

The default GPU resource that is used by the Docling services is nvidia.com/gpu. If the GPUs are configured to use Multi-Instance GPU (MIG), then a corresponding MIG profile must be assigned to each service for it to start properly. The IBM Docling Multimodal resources can be overridden in the CasInstall Custom Resource (CR). Each deployment that requires a GPU has an override flag for use with alternative resource configurations.

To alter the Docling Multimodal GPU targets from the command line, access the CasInstall CR and add the following flags under the spec section:

spec:
  flags:
    - VLLM_VISION_GPU=<gpu target>
    - VLLM_EMBEDDING_GPU=<gpu target>
    - DOCLING_GPU=<gpu target>

For example, to specify an alternative Multi-Instance GPU (MIG) indicator for all, add the following flags under the spec section:

spec:
  flags:
    - VLLM_VISION_GPU=nvidia.com/mig-7g.80gb
    - VLLM_EMBEDDING_GPU=nvidia.com/mig-7g.80gb
    - DOCLING_GPU=nvidia.com/mig-7g.80gb

Result: The GPU resource indicator is then automatically detected and a new deployment is rolled out.