GPU servers
Use this section to get a detailed understanding and configuration details of GPU servers that are used in the IBM Fusion HCI appliance.
The GPU is a specialized server node in the IBM Fusion HCI appliance that is equipped with one or more
graphics processing units. These nodes are designed to handle compute-intensive workloads that
require massive parallel processing, such as AI and high-performance computing applications. The
available GPU servers are:
- Dell hardware - 9155-DG1, DG2, DG3, and DG4. For configuration details, see Dell hardware.
- Lenovo hardware - 9155-G03 and 9155-G04. For configuration details, see Lenovo hardware.
Key capabilities
The following are the key capabilities of the GPU servers:
- Provides massive parallel processing capabilities, making them ideal for workloads that require high-performance computing, such as scientific simulations, data analytics, and AI model training.
- Improves the performance of compute-intensive workloads, reducing processing times and increasing overall system efficiency.
- GPU nodes can be virtualized, allowing multiple virtual machines to share the same GPU resources, improving resource utilization and reducing costs.
Advantages
The following are the key benefits of the GPU servers:
- Improved performance and efficiency
- GPU servers can significantly improve the performance and efficiency of compute-intensive workloads.
- Increased agility
- GPU servers can be quickly provisioned and deployed, allowing organizations to rapidly respond to changing business needs.
- Reduced costs
- GPU servers can help reduce costs by improving resource utilization and reducing the need for specialized hardware.
- Simplified management
- GPU servers are fully integrated with the IBM Fusion HCI platform, providing a single, easy-to-manage interface for administrators.
Hardware configuration
- Dell
- Hardware configuration of 9155-DG1 GPU server:
- Dell XE7745 4U 2-socket server
- 2x 64-core AMD EPYC 5 9575 3.3 GHz 400 W processors
- 2304 GB DDR5 6400 MT/s RAM (24x 64 GB DIMMs)
- 4x NVIDIA L40S 48 GB GPU PCIe Gen 4
- 2x 960 GB M.2 SED NVMe hot-swappable OS drives (RAID 1)
- 4x NVIDIA ConnectX-6 DX dual-port 100 GbE PCIe NICs
- Broadcom Quad Port 1 GbE PCIe Ethernet Adapter
- 8x 3200 W Titanium power supplies
Important:The 9155-DG2, DG3, and DG4 GPU servers contain the same configuration similar to 9155-DG1, but they come up with different GPU cards configuration:- 9155-DG2 - 8x NVIDIA PCIe Gen 4 L40S instead of 4x NVIDIA PCIe Gen 4 L40S.
- 9155-DG3 - 4x NVIDIA H200 NVL 141 GB GPU PCIe Gen 5 instead of 4x NVIDIA PCIe Gen 4 L40S.
- 9155-DG4 - 8x NVIDIA H200 NVL 141 GB GPU PCIe Gen 5 instead of 4x NVIDIA PCIe Gen 4 L40S.
- Lenovo
- Hardware configuration of 9155-G03 GPU server:
- SR675 V3 8DW PCIe GPU Base
- 2x AMD EPYC 9254 24C 200W 2.9GHz
- (1-8)x NVIDIA L40S 48GB GPU Gen4 PCIe adapter cards
- NVIDIA H100 NVL 94GB PCIe Gen5 Passive GPU
Two H100 NVL can be inter-connected with NVLink to efficiently support models up to 188 GB.
- AMD MI210 PCIe GPU Adapter Card
- 768GB RAM (24x 32GB TruDDR5 4800MHz (2Rx8) RDIMM-A)
- Intel I350 1GbE RJ45 4-port OCP Ethernet Adapter
- ConnectX-6 Dx dual port 100GbE network interface card
- ConnectX-6 Lx dual port 25GbE network interface card
- 2x M.2 960GB NVMe drives for the operating system and the drives are in a RAID 1 configuration
- 3U height
Important: Each G03 server supports between 1 and 8 GPU PCIe cards, with options including NVIDIA L40S, NVIDIA H100 NVL, and AMD MI210. However, you cannot mix different GPU models within the same server.