GPU servers

Use this section to get a detailed understanding and configuration details of GPU servers that are used in the IBM Fusion HCI appliance.

The GPU is a specialized server node in the IBM Fusion HCI appliance that is equipped with one or more graphics processing units. These nodes are designed to handle compute-intensive workloads that require massive parallel processing, such as AI and high-performance computing applications. The available GPU servers are:
  • Dell hardware - 9155-DG1, DG2, DG3, and DG4. For configuration details, see Dell hardware.
  • Lenovo hardware - 9155-G03 and 9155-G04. For configuration details, see Lenovo hardware.

Key capabilities

The following are the key capabilities of the GPU servers:
  • Provides massive parallel processing capabilities, making them ideal for workloads that require high-performance computing, such as scientific simulations, data analytics, and AI model training.
  • Improves the performance of compute-intensive workloads, reducing processing times and increasing overall system efficiency.
  • GPU nodes can be virtualized, allowing multiple virtual machines to share the same GPU resources, improving resource utilization and reducing costs.

Advantages

The following are the key benefits of the GPU servers:
Improved performance and efficiency
GPU servers can significantly improve the performance and efficiency of compute-intensive workloads.
Increased agility
GPU servers can be quickly provisioned and deployed, allowing organizations to rapidly respond to changing business needs.
Reduced costs
GPU servers can help reduce costs by improving resource utilization and reducing the need for specialized hardware.
Simplified management
GPU servers are fully integrated with the IBM Fusion HCI platform, providing a single, easy-to-manage interface for administrators.

Hardware configuration

Dell
Hardware configuration of 9155-DG1 GPU server:
  • Dell XE7745 4U 2-socket server
  • 2x 64-core AMD EPYC 5 9575 3.3 GHz 400 W processors
  • 2304 GB DDR5 6400 MT/s RAM (24x 64 GB DIMMs)
  • 4x NVIDIA L40S 48 GB GPU PCIe Gen 4
  • 2x 960 GB M.2 SED NVMe hot-swappable OS drives (RAID 1)
  • 4x NVIDIA ConnectX-6 DX dual-port 100 GbE PCIe NICs
  • Broadcom Quad Port 1 GbE PCIe Ethernet Adapter
  • 8x 3200 W Titanium power supplies
Important:
The 9155-DG2, DG3, and DG4 GPU servers contain the same configuration similar to 9155-DG1, but they come up with different GPU cards configuration:
  • 9155-DG2 - 8x NVIDIA PCIe Gen 4 L40S instead of 4x NVIDIA PCIe Gen 4 L40S.
  • 9155-DG3 - 4x NVIDIA H200 NVL 141 GB GPU PCIe Gen 5 instead of 4x NVIDIA PCIe Gen 4 L40S.
  • 9155-DG4 - 8x NVIDIA H200 NVL 141 GB GPU PCIe Gen 5 instead of 4x NVIDIA PCIe Gen 4 L40S.
Lenovo
Hardware configuration of 9155-G03 GPU server:
  • SR675 V3 8DW PCIe GPU Base
  • 2x AMD EPYC 9254 24C 200W 2.9GHz
  • (1-8)x NVIDIA L40S 48GB GPU Gen4 PCIe adapter cards
  • NVIDIA H100 NVL 94GB PCIe Gen5 Passive GPU

    Two H100 NVL can be inter-connected with NVLink to efficiently support models up to 188 GB.

  • AMD MI210 PCIe GPU Adapter Card
  • 768GB RAM (24x 32GB TruDDR5 4800MHz (2Rx8) RDIMM-A)
  • Intel I350 1GbE RJ45 4-port OCP Ethernet Adapter
  • ConnectX-6 Dx dual port 100GbE network interface card
  • ConnectX-6 Lx dual port 25GbE network interface card
  • 2x M.2 960GB NVMe drives for the operating system and the drives are in a RAID 1 configuration
  • 3U height
Important: Each G03 server supports between 1 and 8 GPU PCIe cards, with options including NVIDIA L40S, NVIDIA H100 NVL, and AMD MI210. However, you cannot mix different GPU models within the same server.
Hardware configuration of 9155-G04 GPU server:
  • SR675 V3 PCIe GPU Base
  • 2x AMD EPYC 9555 64-Core CPU
  • (1-8)x NVIDIA H200 NVL 141GB GPU
  • (1-8)x NVIDIA RTX PRO 6000 Blackwell Server Edition 96GB GPU
  • 1152GB RAM (12x 96GB) and expandable to 2304 GB RAM (24x 96GB)
  • Intel I350 1GbE RJ45 4-port OCP Ethernet Adapter
  • ConnectX-7 Dx dual port 200GbE network interface card
  • ConnectX-6 Lx dual port 25GbE network interface card
  • 2x M.2 960GB NVMe drives for the operating system and the drives are in a RAID 1 configuration
  • 2x 3.84TB NVMe SSD
  • 3U height
Important: Each G04 server supports between 1 and 8 GPU PCIe cards, with options including NVIDIA H200 NVL, and NVIDIA RTXPRO6000. However, you cannot mix different GPU models within the same server.