Power servers

PowerAI: The World’s Fastest Deep Learning Solution Among Leading Enterprise Servers

Share this post:

Over the past several weeks, my IBM colleagues have written about our progress porting and optimizing popular deep learning frameworks for the most advanced platform for accelerated computing in the enterprise, the IBM S822LC for HPC.

Today I am pleased to announce another major milestone: the creation of the world’s fastest deep learning solution among leading enterprise servers. This offering includes new IBM PowerAI software toolkit paired with NVIDIA NVLink and GPUDL libraries optimized for IBM Power architecture. We call it PowerAI.

Foundations of PowerAI

power-ai-announce1

PowerAI brings together a collection of the most popular open source frameworks for deep learning, along with supporting software and libraries, all in a single installable package. Our design goal was to simplify the acquisition, installation and system optimization required to bring up a deep learning infrastructure, allowing users to spend less time on implementation and more time training neural networks for results. More about those results soon.

At the core of the PowerAI solution is the high-performance Power Systems S822LC for a high-performance computing (HPC) server, incorporating two POWER8 CPUs, up to four NVIDIA Tesla P100 GPUs, and across-the-system high-bandwidth NVLink connectivity, tying together GPU-GPU and GPU-CPU with multiple point-to-point connections.

This architecture is designed for the compute intensive requirements of deep learning software, providing a high bandwidth connection between the GPU and system memory, and GPU to GPU. With PowerAI and NVIDIA NVLink, deep learning workloads can utilize this bandwidth, moving large training data sets from system memory to GPU memory; the outcome is designed to be a shorter training cycle and the ability to train with larger data sets for improved accuracy.

Optimizations and industry exclusives

Working closely with IBM Research in Tokyo, the PowerAI development team has integrated several performance enhancements into one of these frameworks. These optimizations, packaged in the IBM-Caffe binary, leverage NVIDIA NVLink bandwidth and reduce some of the redundant data movement within this deep learning framework. This optimization, along with the increased performance of the NVIDIA Tesla P100s, enables a four GPU S822LC for HPC system to outperform an eight GPU plus Intel Broadwell system running the VGGNet workload on the Caffe framework by 24 percent.[1]

powerai-announce-chart1S822LC/HPC with 4 Tesla P100 Tesla GPUs is 24 percent faster than 8 Tesla M40 GPUs

We’re extremely excited about the promise of this optimization and look forward to seeing how our clients and partners incorporate it into their deep learning workflows.

The toolkit also leverages GPUDL libraries including deep neural network library (cuDNN), basic linear algebra subroutines (cuBLAS) and collective communication library (NCCL) as part of  NVIDIA SDKs to deliver multi-GPU acceleration for optimizing performance on IBM servers.

Over time, we intend to explore additional optimizations and unique capabilities integrated into future releases of PowerAI.

Getting started with PowerAI

The PowerAI packages are available now, linked to our PowerAI landing page. These images will install on an S822LC for HPC server running Ubuntu 16.04, NVIDIA CUDA 8 and NVIDIA cuDNN 5.1. If you were to build this infrastructure from scratch, it could likely take days; our design point is to be running in an hour or less.

If you would like to evaluate this solution in the cloud, we are excited to announce that IBM’s Power HPC cloud partner, Nimbix, has made the IBM Caffe framework available on their S822LC for HPC infrastructure as a service; instead of an hour, you could be training in minutes.

We’re truly excited about this offering and would welcome the chance to hear from you. As you and your organization get started with PowerAI, please share your results and comments.

[1] Test System: IBM S822LC 20-cores 2.86GHz 512GB memory  / 4 NVIDIA Tesla P100 GPUs / Ubuntu 16.04 /             CUDA 8.0.44 / cuDNN 5.1  / IBM Caffe 1.0.0-rc3 /  Imagenet Data

Competitive System: Intel Broadwell E5-2640v4 20-core 2.6 GHz 512GB memory / 8 NVIDIA TeslaM40 GPUs / Ubuntu 16.04 / CUDA 8.0.44 / cuDNN 5.1 / BVLC Caffe 1.0.0-rc3 / Imagenet Data

Offering Manager, High Performance Computing and Deep Learning IBM Systems

More Power servers stories

Is your enterprise ready for AI?

We are in the midst of a global transformation and it is touching every aspect of our world, our lives and our businesses. Many in the industry believe artificial intelligence (AI) is the key to fundamentally changing how organizations will derive insights from data. We know that AI will be a game-changer for our clients, […]

Continue reading

Simplify your AIX environment with hyperconverged infrastructure

Do you think that the AIX operating system is no longer relevant in this era of the private cloud? If so, you may want to reconsider that point of view. The cost efficiency of public cloud is certainly changing the landscape in profound ways, and the traditional constructs of on-premises IT need to make room […]

Continue reading

5 key reasons to opt for hyperconverged infrastructure

Let’s face it—traditional design for data center infrastructure doesn’t always meet today’s needs. Analytics, transaction processing, high-volume web serving and other data-intensive and mission-critical workloads place new and increased demands on the IT infrastructure. At the same time, specialized IT skills are becoming increasingly challenging to find, and budgets for capital IT expenses continue to […]

Continue reading