AI workloads on IBM Fusion
IBM Fusion HCI is built to handle complex container AI and VM workloads as it leverages its advanced GPU nodes. IBM Fusion HCI combined with IBM watsonx offers a consistent high performance across a wide range of container AI and VM workloads. Every release of IBM Fusion HCI is validated with IBM watsonx to ensure that the infrastructure is compatible.
You can select the most suitable GPU node based on your specific AI workload requirements. For details on supported GPU types and their hardware configurations, see GPU nodes.
Node labeling and GPU virtualization techniques can effectively enhance performance and ensure efficient resource management for both complex AI container workloads and virtual machine (VM) workloads.
Node labeling for GPU reservation
IBM Fusion HCI uses labels to reserve GPU nodes for AI workloads.
gpu.isf.ibm.com label to
control the placement of its pods. To prevent Backup & Restore workloads on GPU nodes, run the following
command from the OpenShift® cluster to label
the GPU nodes. oc label node/<node name> gpu.isf.ibm.com=""The IBM Fusion service pods avoid nodes with such a label.
You can use this approach for other workloads.For example, the IBM Fusion
Backup & Restore utilizes the
gpu.isf.ibm.com label to manage the placement of its pods
effectively. To restrict IBM Fusion
Backup & Restore workloads from running on GPU nodes, run
the following command to apply this label to these nodes in the OpenShift cluster. After the nodes are labeled,
the Backup & Restore pods automatically avoid GPU nodes
with this label. This approach can also be adapted for other workloads to ensure resource
optimization and placement control.
GPU virtualization options
- PCI Passthrough for VMs to access GPU
- VGPU for VMs
The Peripheral Component Interconnect (PCI) passthrough feature enables virtual machines to directly interact and manage hardware devices. When you configure this feature, it allows these devices to function as if they are physically connected to the guest operating system, thereby ensuring smooth and uninterrupted operation. This setup is particularly useful for applications requiring direct hardware access, such as graphics-intensive tasks or specialized hardware control. For more information about this feature, see Red Hat documentation.
In case of vGPU for VMs, some graphics processing unit (GPU) cards support
the creation of virtual GPUs (vGPUs). OpenShift Virtualization can automatically
create vGPUs and other devices whenever an administrator provides configuration details in the
HyperConverged custom resource (CR). This automation is especially useful for large
clusters. For more information about this feature, see Red Hat documentation and NVIDIA documentation.