Partitioning GPU processors in IBM watsonx.ai

You can optionally configure NVIDIA Multi-Instance GPU (MIG) to split one GPU into more GPU instances on a single node.

Partitioning the GPU is useful when you want to host many smaller foundation models that don't require dedicated GPUs.

Attention:

Do not partition GPU processors in a cluster that you plan to use for tuning foundation models. The NVIDIA Multi-Instance GPU (MIG) feature does not allow for GPU peer-to-peer communication. For example, you cannot use an NVIDIA Peripheral Component Interconnect Express (PCIe) bus or a dedicated NVLink (proprietary high-speed interconnect technology from NVIDIA) connection to enable GPUs to directly access and transfer data from each other's memory, while bypassing system RAM. MIG overly limits the memory that is available for tuning tasks and restricts a tuning task to a single GPU instance.

To set up NVIDIA Multi-Instance GPUs, complete the following steps:

Determine which foundation models can run on partitioned GPUs.
A subset of the supported foundation models can be installed on NVIDIA Multi-Instance GPUs. See System requirements for foundation models in IBM watsonx.ai for details.
Calculate the amount of memory that is required on the partition for the foundation model.
You can use the following calculation to get an idea of the minimum memory requirements. However, be sure to verify that the partition has sufficient resources. To help you estimate the required memory size, you can multiply the number of billion parameters of the model by 3. For example, for a foundation model with 12 billion parameters, multiply 12 by 3. An initial estimate of the memory required for the partition that hosts the model is 36 GB.
Follow the instructions from the MIG Support in Open Shift Container Platform page of the NVIDIA product documentation to configure NVIDIA Multi-Instance GPU.
Attention: You cannot use virtual GPUs (vGPUs) with the IBM watsonx.ai service.

When you add foundation models to your deployment later, install the model on the node with the MIG partition that you configured. For more information, see Adding foundation models to IBM watsonx.ai.