Power servers

PowerAI: The World’s Fastest Deep Learning Solution Among Leading Enterprise Servers

Share this post:

Over the past several weeks, my IBM colleagues have written about our progress porting and optimizing popular deep learning frameworks for the most advanced platform for accelerated computing in the enterprise, the IBM S822LC for HPC.

Today I am pleased to announce another major milestone: the creation of the world’s fastest deep learning solution among leading enterprise servers. This offering includes new IBM PowerAI software toolkit paired with NVIDIA NVLink and GPUDL libraries optimized for IBM Power architecture. We call it PowerAI.

Foundations of PowerAI


PowerAI brings together a collection of the most popular open source frameworks for deep learning, along with supporting software and libraries, all in a single installable package. Our design goal was to simplify the acquisition, installation and system optimization required to bring up a deep learning infrastructure, allowing users to spend less time on implementation and more time training neural networks for results. More about those results soon.

At the core of the PowerAI solution is the high-performance Power Systems S822LC for a high-performance computing (HPC) server, incorporating two POWER8 CPUs, up to four NVIDIA Tesla P100 GPUs, and across-the-system high-bandwidth NVLink connectivity, tying together GPU-GPU and GPU-CPU with multiple point-to-point connections.

This architecture is designed for the compute intensive requirements of deep learning software, providing a high bandwidth connection between the GPU and system memory, and GPU to GPU. With PowerAI and NVIDIA NVLink, deep learning workloads can utilize this bandwidth, moving large training data sets from system memory to GPU memory; the outcome is designed to be a shorter training cycle and the ability to train with larger data sets for improved accuracy.

Optimizations and industry exclusives

Working closely with IBM Research in Tokyo, the PowerAI development team has integrated several performance enhancements into one of these frameworks. These optimizations, packaged in the IBM-Caffe binary, leverage NVIDIA NVLink bandwidth and reduce some of the redundant data movement within this deep learning framework. This optimization, along with the increased performance of the NVIDIA Tesla P100s, enables a four GPU S822LC for HPC system to outperform an eight GPU plus Intel Broadwell system running the VGGNet workload on the Caffe framework by 24 percent.[1]

powerai-announce-chart1S822LC/HPC with 4 Tesla P100 Tesla GPUs is 24 percent faster than 8 Tesla M40 GPUs

We’re extremely excited about the promise of this optimization and look forward to seeing how our clients and partners incorporate it into their deep learning workflows.

The toolkit also leverages GPUDL libraries including deep neural network library (cuDNN), basic linear algebra subroutines (cuBLAS) and collective communication library (NCCL) as part of  NVIDIA SDKs to deliver multi-GPU acceleration for optimizing performance on IBM servers.

Over time, we intend to explore additional optimizations and unique capabilities integrated into future releases of PowerAI.

Getting started with PowerAI

The PowerAI packages are available now, linked to our PowerAI landing page. These images will install on an S822LC for HPC server running Ubuntu 16.04, NVIDIA CUDA 8 and NVIDIA cuDNN 5.1. If you were to build this infrastructure from scratch, it could likely take days; our design point is to be running in an hour or less.

If you would like to evaluate this solution in the cloud, we are excited to announce that IBM’s Power HPC cloud partner, Nimbix, has made the IBM Caffe framework available on their S822LC for HPC infrastructure as a service; instead of an hour, you could be training in minutes.

We’re truly excited about this offering and would welcome the chance to hear from you. As you and your organization get started with PowerAI, please share your results and comments.

[1] Test System: IBM S822LC 20-cores 2.86GHz 512GB memory  / 4 NVIDIA Tesla P100 GPUs / Ubuntu 16.04 /             CUDA 8.0.44 / cuDNN 5.1  / IBM Caffe 1.0.0-rc3 /  Imagenet Data

Competitive System: Intel Broadwell E5-2640v4 20-core 2.6 GHz 512GB memory / 8 NVIDIA TeslaM40 GPUs / Ubuntu 16.04 / CUDA 8.0.44 / cuDNN 5.1 / BVLC Caffe 1.0.0-rc3 / Imagenet Data

Offering Manager, High Performance Computing and Deep Learning IBM Systems

More Power servers stories

4 Ways AI analytics projects fail—and how to succeed

“How do I de-risk my AI-driven analytics projects?” This is a common question for organizations ready to modernize their analytics portfolio. Here are four ways AI analytics projects fail—and how you can ensure success. Artificial intelligence (AI) will offer a tremendous benefit to businesses modernizing their analytics tools. Many enterprises are already gaining valuable insight […]

Continue reading

How 4 organizations went from here to AI: IBM podcast series

Dez Blanchfield speaks with business leaders about artificial intelligence and deep learning adoption in the “From Here to AI” podcast series from IBM Power Systems. When you start to investigate artificial intelligence (AI), or branch out to buy a couple AI servers to tinker with for your organization, the process of implementing a full AI […]

Continue reading

Announcing PowerAI Enterprise: Bringing data science into production

Today I am tremendously excited to announce IBM Cognitive Systems’ newest applied artificial intelligence (AI) offering, PowerAI Enterprise. This new platform extends all the capability we have been packing into our distribution of deep learning and machine learning frameworks, PowerAI, by adding tools which span the entire model development workflow. With these capabilities, customers are […]

Continue reading