AI workloads on IBM Fusion

IBM Fusion HCI is built to handle complex container AI and VM workloads as it leverages its advanced GPU nodes. IBM Fusion HCI combined with IBM watsonx offers a consistent high performance across a wide range of container AI and VM workloads. Every release of IBM Fusion HCI is validated with IBM watsonx to ensure that the infrastructure is compatible.

You can select the most suitable GPU node based on your specific AI workload requirements. For details on supported GPU types and their hardware configurations, see GPU nodes.

Node labeling and GPU virtualization techniques can effectively enhance performance and ensure efficient resource management for both complex AI container workloads and virtual machine (VM) workloads.

Node labeling for GPU reservation

IBM Fusion HCI uses labels to reserve GPU nodes for AI workloads.

For example, the IBM Fusion Backup & Restore uses the gpu.isf.ibm.com label to control the placement of its pods. To prevent Backup & Restore workloads on GPU nodes, run the following command from the OpenShift® cluster to label the GPU nodes.
oc label node/<node name> gpu.isf.ibm.com=""
The IBM Fusion service pods avoid nodes with such a label. You can use this approach for other workloads.

For example, the IBM Fusion Backup & Restore utilizes the gpu.isf.ibm.com label to manage the placement of its pods effectively. To restrict IBM Fusion Backup & Restore workloads from running on GPU nodes, run the following command to apply this label to these nodes in the OpenShift cluster. After the nodes are labeled, the Backup & Restore pods automatically avoid GPU nodes with this label. This approach can also be adapted for other workloads to ensure resource optimization and placement control.

GPU virtualization options

IBM Fusion provides the following GPU virtualization options:
  • PCI Passthrough for VMs to access GPU
  • VGPU for VMs

The Peripheral Component Interconnect (PCI) passthrough feature enables virtual machines to directly interact and manage hardware devices. When you configure this feature, it allows these devices to function as if they are physically connected to the guest operating system, thereby ensuring smooth and uninterrupted operation. This setup is particularly useful for applications requiring direct hardware access, such as graphics-intensive tasks or specialized hardware control. For more information about this feature, see Red Hat documentation.

In case of vGPU for VMs, some graphics processing unit (GPU) cards support the creation of virtual GPUs (vGPUs). OpenShift Virtualization can automatically create vGPUs and other devices whenever an administrator provides configuration details in the HyperConverged custom resource (CR). This automation is especially useful for large clusters. For more information about this feature, see Red Hat documentation and NVIDIA documentation.