IBM Support

IBM Power9 performance proof-points ARCHIVE

This content is no longer being updated or maintained and is provided "as is".

Big Data and Analytics

For the systems and workload compared:

  • IBM® Power® System LC922 server delivers superior performance running multiple TPC-DS query streams with Apache Spark SQL
  • It delivers 1.30x the query results per hour running on IBM POWER9™ compared to Intel Xeon SP Gold 6140, which is 1.59x better in price and performance

System configuration

Power System Competitor
Hardware Four nodes of IBM Power LC922 (two 20-core / 2.7 GHz / 512 GB memory) by using twelve 8 TB HDD, 10 GbE two-port, RHEL 7.5 LE for IBM POWER9 Four nodes of Intel Xeon Gold 6140 36 cores (2 x 18c chips) at 2.3 GHz; 512 GB memory, twelve 8 TB HDDs, 10 Gbps NIC, Red Hat Enterprise Linux 7.5
Software Apache Spark 2.3.0 at http://spark.apache.org/downloads.html; and open source Hadoop 2.7.5 Apache Spark 2.3.0 at http://spark.apache.org/downloads.html; and open source Hadoop 2.7.5

Notes:

For the systems and workload compared:

  • The IBM® Power® System L922 costs less than the Intel 8168.
  • The Power L922 runs 2.54x more queries per hour per core than the Intel 8168.
  • The Power L922 cluster provides 2.44x better price performance than the Intel 8168 cluster.
  • The Power L922 solution enables 57% lower solution costs than using the Intel 8168.
 
IBM Power L922
(20-core, 512 GB)
Intel Xeon SP-based 2-socket server
(48-core, 512 GB)
QpH 1
Total queries per hour
3064 QpH 2891 QpH
Server price 2 , 3 , 4
3-year warranty
$37,222 $52,330
Solution Cost 5
(three nodes)
Server + RHEL OS + Virtualization + Db2 @ $12,800* per core
$817,299
Per node: ($13,341 + $12,077 + $256,000*)
$1,899,449
Per node: ($30,126 + $3,919 + $614,400*)
QpH per $1000 3.74 QpH/$1000 1.53 QpH/$1000

System configuration

Power System Competitor
3x Power L922 servers with 20-cores and 512 GB RAM 3x Intel 8168 servers with 48-cores and 512 GB RAM

Notes:

For the systems and workload compared:

  • Improved application performance with Kinetica filtering Twitter Tweets
  • 80% more throughput on IBM Power System AC922 than IBM Power System S822LC for HPC
 

System configuration

IBM Power System AC922 IBM Power System S822LC for HPC
40 cores (2 x 20c chips) 20 cores (2 x 10c chips) / 160 threads
POWER9 with NVLink 2.0 POWER8 with NVLink
2.25 GHz, 1024 GB memory 2.86 GHz, 1024 GB memory
(4) Tesla V100 GPUs (4) Tesla P100 GPUs
(2) 6 Gb SSDs, 2-port 10 Gb Ethernet
Red Hat Enterprise Linux 7.4 for Power Little Endian (POWER9) running Kinetica 6.1 (Red Hat Enterprise Linux 7.4 for POWER8 running Kinetica 6.1

Notes:

Cloud and Virtualization

For the systems and workload compared:

  • Power LC922 provides 2X price-performance versus Intel Xeon SP Gold 6150 based servers
  • Power LC922 enables 47% better system-level performance
  • Power LC922 allows 1/3 more virtual machine to be supported
IBM Power LC922
(44-core, 256 GB)
Intel Xeon SP based
two-socket server
(36-core, 256 GB)
Server price 2 , 3 , 4
3-year warranty
$21,878 $30,587
Operations per
second (ops/s) per VM
1
Four VMs at
118,232
Three VMs at
107,579
Ops/s 1
Performance
472,927 322,738
Ops/s per USD
Price-Performance
22 ops/s per USD 11 ops/s per USD
 

System configuration

IBM Power System LC922 Intel Xeon SP Gold 6150
IBM POWER9™, 2x 22-core/2.6 GHz/256 GB memory Two-socket Intel Xeon SP Gold 6150, 2x 18-core/2.7 GHz/256 GB memory
Two internal HDD Two 300 GB SATA 15K rpm HDD
10 GbE two-port 10 GbE two-port
One 16 Gbps FCA running four VMs of MongoDB 3.6 One 16 Gbps FCA running three VMs of MongoDB 3.6
RHEL 7.5 LE for POWER9 RHEL 7.5

Notes:

For the systems and workload compared:

  • The IBM® Power® System S924 costs less than the Intel 8180.
  • The Power S924 runs 3.4x more transactions per second (TPS) per core than the Intel 8180.
  • The Power S924 provides 2.43x better price performance than the Intel 8180.
  • The Power S924 solution enables 39% lower solution cost than using Intel 8180.
 
IBM Power S924
(24-core, 1024 GB)
Intel Xeon SP based two-socket server
(56-core, 768 GB)
TPS 1
Total transactions per second
32,221 TPS 21,888 TPS
Server price 2 , 3 , 4
3-year warranty
$94,697 $77,203
Solution cost 5
Server + Linux OS +
Virtualization + WAS
at $6,104 per core
$255,230
($94,697 + $14,047 + $146,496)
$422,946
($77,203 + $3,919 + $341,824)
TPS per $1000 126.2 TPS/$1000 51.8 TPS/$1000

 

System configuration

Power System Competitor
Power S924 with 24 cores and 1024 GB RAM Intel 8180 with 56 cores and 768 GB RAM

Notes:

For the systems and workload compared:

  • IBM® Power® System L922 runs 1.86x more queries per hour per core than Intel 6130.
  • Power L922 solution enables 43% lower solution cost than Intel 6130.
  • Power L922 cluster provides 1.66x better price performance than Intel 6130.
 
IBM Power L922
(16-core, 256 GB, two VMs)
Intel Xeon SP based two-socket server
(32-core, 256 GB, two VMs)
Server price 1 , 2 , 3
3-year warranty
$25,932 $29,100
Solution cost 4
Server + RHEL OS + Virtualization + ICP
Cloud Native VPC annual subscription at $250 per core per month x 36 months
$180,049
($25,932 + $10,117 + $144,000)
$321,019
($29,100 + $3,919 + $288,000)
Acme Air workload 5
Total transactions per second with two VMs
36,566 TPS 39,312 TPS
TPS/$1000 203.1 TPS/$1000 122.5 TPS/$1000

 

Notes:

For the systems and workload compared:

  • Save over $2 million per 15 server rack with IBM® Power® System L922 running IBM Cloud Private compared to Intel Xeon SP.
  • IBM Power Systems™ designed for cognitive clouds:
    • Deliver more container throughput per core (1.86x better compared Intel-based systems)
    • Deliver more price-performance value per rack unit when running container-based workloads
 
15 x IBM Power L922
(16-core, 256 GB, two VMs)
15 x Intel Xeon SP based two-socket server
(32-core, 256 GB, two VMs)
Rack solution cost 1 , 2 , 3 , 4
Server + RHEL OS + Virtualization
+ ICP Cloud Native VPC annual
subscription at $250 per core per month x 36 months
$2,700,735 $4,815,285
Acme Air workload 5
Total transactions per second with two VMs
548,490 TPS 589,680 TPS
TPS/$1000 203.1
TPS/$1000
122.5 TPS/$1000

 

Notes:

Database, OLTP, ERP

For the systems and workload compared:

  • Power L922 enables 2.4X price-performance leadership over tested Intel Xeon SP Gold 6148 servers
  • Power L922 provides 40% better system-level performance at a 40% lower system cost
  • Power L922 offers a superior cost efficiency for your EnterpriseDB workloads

System configuration

IBM Power System L922 Intel Xeon Skylake Gold 6148
IBM Power L922 (2x 10-core/2.9 GHz/256 GB memory) Two-socket Intel Xeon Skylake Gold 6148 (2x 20-core/2.4 GHz/256 GB memory)
2x 300 GB SATA 7.2K rpm LFF HDD, 2x 300 GB HDD
10 Gb two-port 1 Gb two-port
1x 16 Gbps FCA 1x 16 Gbps FCA
EDB Postgres Advanced Server 10 EDB Postgres Advanced Server 10
RHEL 7.5 with IBM PowerVM® (four partitions with five cores each) RHEL 7.5 KVM (four VMs with 10 cores each)

 

Notes:

IBM® Power® System LC922 running cassandra-stress delivers superior performance with ScyllaDB compared to Cassandra on tested x86 systems at a lower price

For the systems and workload compared:

  • Power LC922 provides 3.9x better price-performance compared to Intel® Xeon® SP Gold 6140 based servers
  • Power LC922 provides 216% more performance per system
  • Power LC922 enables 22% lower server cost
IBM Power LC922
(44-core, 256 GB)
Intel Xeon SP based
two-socket server
(36-core, 256 GB)
Operations per second 1
Performance
906,463 286,627
Operations per second per $
Price performance
35 9
Server price 2, 3, 4
Includes 3-year warranty
$25,615 $31,373

System configuration

Power LC922 Intel Xeon SP Gold 6140
IBM POWER9™, 2x 22-core/2.6 GHz/256 GB memory Two-socket Intel Xeon SP Gold 6140, 2x 18-core/2.3 GHz/256 GB memory
Two internal HDD Two internal HDD
40GbE 40GbE
1.6 TB NVMe adapter running Scylla Enterprise 2018.1.0 1.6 TB NVMe adapter running open source Cassandra 3.11.2
RHEL 7.5 LE for POWER9 RHEL 7.5

 

Notes:

High Performance Computing

IBM® Power® System AC922 server with four NVIDIA Tesla V100 GPUs with OpenMP 4.5 GPU offload for the Livermore Unstructured Lagrangian Explicit Shock Hydrodynamics (LULESH) application can run 12x faster compared to CPU-only variant implementation in reaching figure of merit

OpenMP 4.5 GPU offload enables:

  • Acceleration of CPU-only applications with minimal development efforts with pragma directives (6% to 8% of code addition)
  • Architecture-independent multi-GPU implementation with ease for accelerated computing

For the systems and workload compared:

  • System: IBM POWER9™ based Power System AC922
  • Workload: LULESH

System configuration

Power AC922 for HPC (with GPU)
System details IBM POWER9 with NVLink, 2.8 GHz, 44 cores
Memory 1 TB
Operating system RHEL 7.6 for Power Little Endian (POWER9)
CUDA toolkit /Driver CUDA toolkit 10.1.152/ CUDA driver 418.67
GPU details NVIDIA Tesla V100 with NVLink GPU
NVLink details NVIDIA NVLink 2.0
Compiler details IBM XL C/C++ for Linux, V16.1.1 (RC 3)
Multi-Process Service (MPS) On
 

Notes:

Achieve faster simulation using Reaction-Field (RF) method on IBM® Power® System AC922 server that is based on the IBM POWER9™ processor technology.

For the systems and workload compared:

  • IBM Power AC922 with four Tesla V100 GPUs is 1.76x faster than previous generation IBM Power System S822LC server with four Tesla P100 GPUs.

System configuration

Power AC922 for HPC (with GPU) Power S822LC for HPC (with GPU)
IBM POWER9 with NVLink, 2.8 GHz, 44 cores IBM POWER8 with NVLink, 4.023 GHz, 20 cores, 40 threads
1 TB memory 256 GB memory
RHEL 7.5 for Power Little Endian (POWER9) RHEL 7.3
CUDA toolkit 10.0 / CUDA driver 410.37 CUDA 8.0
NVIDIA Tesla V100 with NVLink GPU NVIDIA Tesla P100 with NVLink GPU
NVIDIA NVLink 2.0 NVIDIA NVLink 1.0
GNU 7.3.1 (IBM Advance Toolchain 11) GNU 4.8.5 (OS default )
 

Notes:

CPMD on IBM POWER9™ with NVLink 2.0 runs 2.12X faster than tested x86 Xeon Gold 6150 systems providing reduced wait time and improved computational chemistry simulation execution time.

For the systems and workload compared:

  • AC922 delivers 2.12X reduction in execution time of tested x86 Xeon Gold 6150 system

System configuration

POWER9 AC922 with GPU Tesla V100 Xeon Gold 6150 with Tesla V100
POWER9 with NVLink 2.0, 2.8 GHz, 44-cores Xeon Gold 6150, 2.70 GHz, 36-cores
1 TB memory 384 GB memory
RHEL 7.5 for Power Little Endian (POWER9) Ubuntu 16.04.3
CUDA toolkit 9.2 / CUDA driver 396.31 CUDA 9.1 / CUDA driver 390.30
NVIDIA Tesla V100-SXM2 NVIDIA Tesla V100-PCIE
 

Software Stack

POWER9 AC922 with GPU Tesla V100 Xeon Gold 6150 with Tesla V100
CPMD version 4423 4423
SpectrumMPI 10.2.0.01prpq OpenMPI 3.0.0
Compiler IBM XLC/XLF 16.1.0 (Beta 4) GNU 5.4.0
Scientific Libraries IBM ESSL 6.1 RC2 , LAPACK 3.5.0 OpenBLAS 0.2.18 , LAPACK 3.5.0

Notes:

For the systems and workload compared:

  • IBM Power System AC922 delivers 2.9X reduction in execution time of tested x86 systems
  • IBM Power System AC922 delivers 2.0X reduction in execution time compared to prior generation IBM Power System S822LC for HPC
  • POWER9 with NVLink 2.0 unlocks the performance of GPU-accelerated version of CPMD by enabling lightning fast CPU-GPU data transfers
    • 3.3 TB of data movement required between CPU and GPU
    • 70 seconds for NVLink 2.0 transfer time vs
    • 300+ seconds for traditional PCIe bus transfer time

System configuration

IBM Power System AC922 IBM Power System S822LC for HPC 2x Intel Xeon E5-2640 v4
40 cores (2 x 20c chips) 20 cores (2 x 10c chips) / 160 threads 20 cores (2 x 10c chips) / 40 threads
POWER9 with NVLink 2.0 POWER8 with NVLink Intel Xeon E5-2640 v4
2.25 GHz, 1024 GB memory 2.86 GHz, 256 GB memory 2.4 GHz, 256 GB memory
(4) Tesla V100 GPUs (4) Tesla P100 GPUs (4) Tesla P100 GPUs
Red Hat Enterprise Linux 7.4 for Power Little Endian (POWER9) with ESSL PRPQ RHEL 7.4.with ESSL 5.3.2.0 Ubuntu 16.04 with OPENBLAS 0.2.18
Spectrum MPI: PRPQ release, XLF: 15.16, CUDA 9.1 PE2.2, XLF: 15.1, CUDA 8.0 OpenMPI: 1.10.2, GNU-5.4.0, CUDA-8.0

Notes:

For the systems and workload compared:

  • The GPU-accelerated NAMD application runs 2x faster on an IBM® Power® AC922 system compared to an IBM Power System S822LC system.

System configuration

Power AC922 for HPC (with GPU) Power S822LC for HPC (with GPU)
IBM POWER9 with NVLink, 2.8 GHz, 40 cores, 80 threads IBM POWER8 with NVLink, 4.023 GHz, 20 cores, 40 threads
1 TB memory 256 GB memory
RHEL 7.4 for Power Little Endian (POWER9) RHEL 7.3
CUDA toolkit 9.1 / CUDA driver 390.31 CUDA 8.0
NVIDIA Tesla V100 with NVLink GPU NVIDIA Tesla P100 with NVLink GPU
NVIDIA NVLink 2.0 NVIDIA NVLink 1.0

Notes:

According to ORNL, Summit is the next leap in leadership-class computing systems for open science.

  • ORNL reports 5-10X application performance with ¼ of the nodes vs Titan
  • Summit will deliver more than five times the computational performance of Titan’s 18,688 nodes, using only approximately 4,600 nodes.
  • Each Summit node will contain multiple IBM POWER9 CPUs and NVIDIA Volta GPUs all connected together with NVIDIA’s high-speed NVLink and a huge amount of memory.
  • Each node will have over half a terabyte of coherent memory (HBM “high bandwidth memory” + DDR4) addressable by all CPUs and GPUs, plus an additional 800 gigabytes of NVRAM.

System configuration

Feature Titan Summit
Application Performance Baseline 5-10x Titan
Number of Nodes 18,688 ~4,600
Node performance 1.4 TF/s > 40 TF/s
Memory per Node 32 GB DDR3 + 6 GB GDDR5 512 GB DDR4 + HBM
NV memory per Node 0 1600 GB
Total System Memory 710 TB >10 PB DDR4 + HBM + Non-volatile
System Interconnect
(node injection bandwidth)
Gemini (6.4 GB/s) Dual Rail EDR-IB (23 GB/s)
Interconnect Topology 3d Torus Non-blocking Fat Tree
Processors 1 AMD Opteron™
NVIDIA Kepler™
2 IBM POWER9™
NVIDIA Volta™
File System 32 PB, 1 TB/s, Lustre© 250 PB, 2.5 TB/s, GPFS™
Peak power consumption 9 MW 15 MW

Notes:

Resolve the PCI-E bottleneck for your code with IBM POWER9™ and NVLink 2.0 -- transfer data 5.6X faster than the CUDA Host-Device Bandwidth of tested x86 platforms. POWER9 is the only processor with NVLink 2.0 from CPU to GPU.

For the systems and workload compared:

  • POWER9 Delivers 5.6X Host-Device bandwidth versus Xeon E5-2640 v4 with CUDA H2D Bandwidth Test
  • No code changes required to leverage NVLink capability
    • Application performance could be further increased with application code optimization

System configuration

IBM Power System AC922 IBM Power System S822LC for HPC Intel Xeon E5-2640 v4
40 cores (2 x 20c chips) 20 cores (2 x 10c chips) 20 cores (2 x 10c chips) / 40 threads
POWER9 with NVLink 2.0 POWER8 with NVLink 1.0 Intel Xeon E5-2640 v4
2.25 GHz, 1024 GB memory 2.86 GHz, 1024 GB memory 2.4 GHz, 512 GB memory
(4) Tesla V100 GPUs (4) Tesla P100 GPUs (4) Tesla P100 GPUs
RHEL 7.4 for Power LE (POWER9) RHEL 7.3 Ubuntu 16.04

Notes:

Machine Learning/ Deep Learning

2 servers (that are based on the IBM POWER9™ processor technology) with NVIDIA Tesla V100 GPUs (NVLink 2.0) delivers up to 80x speedup (for example, Lasso on a large model/data set) when training machine learning algorithms to accuracy compared to tested scikit-learn on x86 combination1.

  • Snap ML works efficiently even when the model’s memory footprint or data set exceeds the GPU memory.
  • Scikit-learn must be used with x86 because stand-alone cuML on x86 fails if the data set or runtime artifacts exceed the GPU memory while testing.
  • Price Prediction data set (which is large, feature-rich, and sparse) from Kaggle is used.3

System configuration

IBM Power System AC922 2x Intel Xeon Gold 6150
40 cores (two 20c chips), POWER9 with NVLink 2.0 36 cores (two 18c chips)
3.8 GHz, 1 TB memory 2.70 GHz, 512 GB memory
Single Tesla V100 GPU, 16 GB GPU Single Tesla V100 GPU, 16 GB GPU
Red Hat Enterprise Linux 7.7 for Power Little Endian (POWER9) with CUDA 10.1.243 Ubuntu 18.04.3 LTS (4.15.0-54-generic) with CUDA 10.1.243
nvidia-driver-418.67 nvidia-driver-418.67
Software: WML CE 1.6.2; pai4sk-1.5.0, NumPy 1.16.5 Software: scikit-learn 0.21.3, NumPy 1.17.3

Notes:

For the systems and workload compared, Snap ML library combined with IBM® Power® System AC922 servers (that are based on IBM POWER9™ processor technology) with NVIDIA Tesla V100 GPU (NVLink 2.0) provide speedup when training machine learning algorithms ( such as Ridge, Lasso, and Logistic regression) to accuracy compared to tested cuML on x86 systems.1

Snap ML with Power AC922 outperforms tested cuML with x86 combinations on:

  • Feature-rich data sets such as Price Prediction and Epsilon (utilization of more features enable faster time to accuracy).
  • Sparse data sets such as Taxi and Price Prediction (can be handled natively by the Snap ML generalized linear models).

System configuration

IBM Power System AC922 2x Intel Xeon Gold 6150
40 cores (two 20c chips), POWER9 with NVLink 2.0 36 cores (two 18c chips)
3.8 GHz, 1 TB memory 2.70 Ghz, 512 GB memory
Single Tesla V100 GPU, 16 GB GPU Single Tesla V100 GPU, 16 GB GPU
Red Hat Enterprise Linux 7.7 for Power Little Endian (POWER9) with CUDA 10.1.243 Ubuntu 18.04.3 LTS (4.15.0-54-generic) with CUDA 10.1.243
nvidia-driver-418.67 nvidia-driver-418.67
Software: WML CE 1.6.2; pai4sk 1.5.0, NumPy 1.16.5 Software: cuML 0.10.0. NumPy 1.17.3

Notes:

For the systems and workload compared, IBM® Power® System AC922 servers (based on the IBM POWER9 processor technology) with NVIDIA Tesla V100 GPUs connected through NVLink 2.0 along with WML-CE TensorFlow Large Model Support (LMS) can provide:

  • 3.13x increase in throughput compared to tested x86 systems with four GPUs
  • 2.77x increase in throughput compared to tested x86 systems with eight GPUs
    Training DeepLabv3+ based model with distributed deep learning (DDL) and TensorFlow LMS on Power AC922 using PASCAL Visual Object Classes (VOC) 2012 data set with an image resolution of 2100 ^ 2 and batch size 1
  • Critical machine learning (ML) capabilities: Regression, nearest neighbor, recommendation systems, clustering, and so on, and use system memory across the NVLink 2.0.
    • NVLink 2.0 enables enhanced host-to-GPU communication
    • IBM's LMS for deep learning enables seamless use of host and GPU memory for improved performance

System configuration

IBM Power System AC922 2x Intel Xeon E5-2698
40 cores (two 20c chips), POWER9 with NVLink 2.0 40 cores (two 20c chips)
3.8 GHz, 1 TB memory 2.40 GHz, 768 GB memory
Four Tesla V100 GPU, 16 GB-GPU Eight Tesla V100 GPU, 16 GB-GPU
Red Hat Enterprise Linux (RHEL) 7.6 for Power Little Endian (POWER9) with CUDA 10.1.168/ CUDNN 7.5.1 Ubuntu 16.04.5 with CUDA .10.1.168 / CUDNN 7.5.1
nvidia-driver 418.67 nvidia-driver 418.39
Software: IBM TFLMS (POWER9), TFLMSv2- WML-CE 1.6.1 tensorflow-large-model-support 2.0.1 Software: TFLMSv2: WML-CE 1.6.1 tensorflow-large-model-support 2.0.1
 

Notes:

For the systems and workload compared, IBM® Power® System AC922 servers (based on the IBM POWER9™ processor technology) with NVIDIA Tesla V100 GPUs connected through NVLink 2.0 along with WML-CE PyTorch Large Model Support (LMS) can provide:

  • 2.9x increase in throughput compared to tested x86 systems with four GPUs
  • 2.4x increase in throughput compared to tested x86 systems with eight GPUs
    Training DeepLabv3+ based model with distributed deep learning (DDL) and Pytorch LMS on Power AC922 using PASCAL Visual Object Classes (VOC) 2012 data set with an image resolution of 2200 ^ 2 and batch size 2
  • Critical machine learning (ML) capabilities: Regression, nearest neighbor, recommendation systems, clustering, and so on, and use system memory across NVLink 2.0
    • NVLink 2.0 enables enhanced host-to-GPU communication
    • IBM's LMS for deep learning enables seamless use of host and GPU memory for improved performance

System configuration

IBM Power System AC922 2x Intel Xeon E5-2698
40 cores (two 20c chips), POWER9 with NVLink 2.0 40 cores (two 20c chips)
3.8 GHz, 1 TB memory 2.40 GHz, 768 GB memory
Four Tesla V100 GPU, 16 GB-GPU Eight Tesla V100 GPU, 16 GB-GPU
Red Hat Enterprise Linux (RHEL) 7.6 for Power Little Endian (POWER9) with CUDA 10.1.168/ CUDNN 7.5.1 Ubuntu 18.04.2 with CUDA .10.1.168 / CUDNN 7.5.1
nvidia-driver – 418.67 nvidia-driver – 418.67
Software: IBM PyTorch (POWER9), WML-CE 1.6.1 PyTorch 1.1.0 Software: WML-CE 1.6.1 PyTorch 1.1.0
 

Notes:

Accelerate Data Scientist productivity and drive faster insights with IBM DSX Local on IBM Power System AC922

For the systems and workload compared:

  • Power AC922 (based on the IBM POWER9™ processor technology with NVIDIA GPUs) completes running GPU-accelerated K-means clustering with 15 GB data in half the time of tested x86 systems (Skylake 6150 with NVIDIA GPUs).
  • Power AC922 delivers 2x faster insights for GPU-accelerated K-means clustering workload than Intel® Xeon® SP Gold 6150-based servers.
  • IBM Power Systems™ cluster with Power LC922 (CPU optimized) and Power AC922 (GPU accelerated) provides an optimized infrastructure for DSX Local.

System configuration

Power AC922 Two-socket Intel Xeon Gold 6150
IBM POWER9, 2x 20 cores/3.78 GHz, and 4x NVIDIA Tesla V100 GPUs with NVLink Gold 6150, 2x 18 cores/2.7 GHz and 4x NVIDIA Tesla V100 GPUs
1 TB memory, each user assigned 180 GB in DSX Local 768 GB memory, each user assigned 180 GB in DSXL
2x 960 GB SSD 2x 960 SSD
10 GbE two-port 10 GbE two-port
RHEL 7.5 for POWER9 RHEL 7.5
Data Science Experience Local 1.2 fp3 Data Science Experience Local 1.2 fp3

Notes:

Accelerate Data Scientist productivity and drive faster insights with DSX Local on IBM Power System LC922

For the systems and workload compared:

  • Power LC922 running K-means clustering with 1 GB data scales to 2X more users than tested x86 systems
  • Power LC922 supports 2x more users at a faster response time than Intel® Xeon® SP Gold 6140-based servers.
  • Power LC922 delivers over 41% faster insights for the same (four to eight) number of users.

System configuration

Power LC922 Two-socket Intel Xeon SP Gold 6140
IBM POWER9™, 2x 20 cores/2.6 GHz/512 GB memory Gold 6140, 2x 18 cores/2.4 GHz/512 GB memory
10x 4 TB HDD 10x 4 TB HDD
10 GbE two-port 10 GbE two-port
RHEL 7.5 for POWER9 RHEL 7.5
Data Science Experience Local 1.1.2 Data Science Experience Local 1.1.2
 

Notes:

Large Model Support (LMS) uses system memory and GPU memory to support more complex and higher resolution data. Maximize research productivity running training for medical/satellite images with Caffe with LMS on POWER9 with Nvidia V100 GPUs.

For the systems and workload compared:

  • 3.7X reduction versus tested x86 systems in runtime of 1000 iterations running on competing systems to train medical/satellite images
  • Critical machine learning (ML) capabilities such as regression, nearest neighbor, recommendation systems, clustering, etc., operate on more than just the GPU memory
    • NVLink 2.0 enables enhanced Host to GPU communication
    • LMS for deep learning from IBM enables seamless use of Host + GPU memory for improved performance

System configuration

IBM Power System AC922 2x Intel Xeon E5-2640 v4
POWER9 with NVLink 2.0 Xeon E5-2640 v4
40 cores (2 x 20c chips) 20 cores (2 x 10c chips) / 40 threads
2.25 GHz, 1024 GB memory 2.4 GHz, 1024 GB memory
(4) Tesla V100 GPUs (4) Tesla V100 GPUs
RHEL 7.4 Power LE (POWER9) Ubuntu 16.04
CUDA 9.1/CUDNN 7 CUDA 9.0/CUDNN 7

Notes:

Large Model Support (LMS) uses system memory and GPU memory to support more complex and higher resolution data. Maximize research productivity running training for medical/satellite images with Caffe with LMS on POWER9 with Nvidia V100 GPUs.

For the systems and workload compared:

  • 3.8X reduction versus tested x86 systems in runtime of 1000 iterations running on competing systems to train on 2240 x 2240 images
  • Critical machine learning (ML) capabilities such as regression, nearest neighbor, recommendation systems, clustering, etc., operate on more than just the GPU memory
    • NVLink 2.0 enables enhanced Host to GPU communication
    • LMS for deep learning from IBM enables seamless use of Host + GPU memory for improved performance

System configuration

IBM Power System AC922 2x Intel Xeon E5-2640 v4
POWER9 with NVLink 2.0 Intel Xeon E5-2640 v4
40 cores (2 x 20c chips) 20 cores (2 x 10c chips) / 40 threads
2.25 GHz, 1024 GB memory 2.4 GHz, 1024 GB memory
(4) Tesla V100 GPUs (4) Tesla V100 GPUs
RHEL 7.4 Power LE (POWER9) Ubuntu 16.04
CUDA 9.1/CUDNN 7 CUDA 9.0/CUDNN 7

Notes:

Maximize research productivity by training on more images in the same time with TensorFlow 1.4.0 running on IBM Power System AC922 servers with Nvidia Tesla V100 GPUs connected via NVLink 2.0

For the systems and workload compared:

  • 35% more images processed per second vs tested x86 systems
  • ResNet50 testing on ILSVRC 2012 data set (aka Imagenet 2012)
    • Training on 1.2M images
    • Validation on 50K images

System configuration

IBM Power System AC922 2x Intel Xeon E5-2640 v4
POWER9 with NVLink 2.0 Intel Xeon E5-2640 v4
40 cores (2 x 20c chips) 20 cores (2 x 10c chips) / 40 threads
2.25 GHz, 1024 GB memory 2.4 GHz, 1024 GB memory
(4) Tesla V100 GPUs (4) Tesla V100 GPUs
RHEL 7.4 Power LE (POWER9) Ubuntu 16.04
Tensorflow 1.4.0 framework and HPM Resnet50 Tensorflow 1.4.0 framework and HPM Resnet50

Notes:

Maximize research productivity by training on more images in the same time with TensorFlow 1.4.0 running on a cluster of IBM Power System AC922 servers with Nvidia Tesla V100 GPUs connected via NVLink 2.0

For the systems and workload compared:

  • 2.3X more images processed per second vs tested x86 systems
  • PowerAI Distributed Deep Learning (DDL) library provides innovative distribution methods enabling AI frameworks to scale to multiple servers leveraging all attached GPUs
  • ResNet50 testing on ILSVRC 2012 data set (also known as Imagenet 2012)
    • Training on 1.2M images
    • Validation on 50K images

System configuration

4-nodes IBM Power System AC922 4-nodes of 2x Intel Xeon E5-2640 v4
POWER9 with NVLink 2.0 Intel Xeon E5-2640 v4
40 cores (2 x 20c chips) 20 cores (2 x 10c chips) / 40 threads
2.25 GHz, 1024 GB memory 2.4 GHz, 1024 GB memory
(4) Tesla V100 GPUs (4) Tesla V100 GPUs
RHEL 7.4 Power LE (POWER9) Ubuntu 16.04
Tensorflow 1.4.0 framework and HPM Resnet50 Tensorflow 1.4.0 framework and HPM Resnet50

Notes: