October 7, 2016 | Written by: Sumit Gupta
Categorized: OpenPOWER | Power servers | Power Systems
Share this post:
With POWER8 and NVIDIA NVLink
What do you get when you combine NVIDIA’s most advanced data center accelerator with what we think is the best CPU for big data, in a server that was built to unleash their combined performance for deep learning and HPC applications?
Two weeks ago we announced the world’s first server built with NVIDIA’s new Tesla P100 GPU accelerators, connected with the high-speed NVIDIA NVLink interface to IBM’s POWER8 processor. Hundreds of these servers, called the IBM Power System 822LC for HPC, have already been shipped worldwide as part of our early shipment program for key high-performance computing (HPC) and high-performance data analytics (HPDA) applications and workloads.
Today I am excited to share with you a new addition to the fast-growing world of deep learning highlighting the capabilities of the POWER8 CPU, NVIDIA Tesla P100 GPU and NVIDIA NVLink.
4-Tesla P100 GPU Server is 2.2 times faster than the 4-Tesla M40 GPU Server
Deep learning training speed measures how quickly and efficiently a deep neural network can be trained to identify and categorize information within a particular learning set. Deep learning software frameworks scale well with GPU accelerators and system bandwidth. The combination of NVIDIA Tesla P100 GPUs connected to each other and the POWER8 CPU by NVIDIA NVLink makes the new IBM S822LC for HPC a very powerful platform for deep-learning training.
Last week, our deep learning performance team reached a new milestone. When I last wrote about this workload three weeks ago, we had successfully trained AlexNet, a common benchmark for deep-learning training, to 50 percent accuracy in 1 hour, 44 minutes. This week, we’re seeing results that have reduced the training time in the same test to 57 minutes; under an hour! This training data was collected with the latest system firmware along with system and software optimizations to tune overall S822LC system performance. As we continue to better understand the capabilities of the S822LC for HPC, we’ll continue to share progress on how we are fine-tuning the system for today’s deep-learning demands.
Figure 1: Curve of accuracy increased in AlexNet Training by time elapsed
We’re tremendously excited by this result and by what it reflects for IBM’s investment in this quickly-evolving segment. It gets even better when we compare results in the same test to the performance of the previous generation of deep-learning systems built around the PCIe-attached NVIDIA Tesla.
Figure 2: Performance in Minutes of Tesla M40/PCIe system vs Tesla P100/NVLink System
A single S822LC for HPC with four NVIDIA Tesla P100 GPUs is 2.2 times faster reaching 50 percent accuracy in AlexNet than a server with four NVIDIA Tesla M40 GPUs!1
Get started with a POWER8-Pascal Server today
If you work on GPU-accelerated deep learning and HPC applications, take this opportunity to try out these new systems and enjoy the benefits of both the Tesla P100 GPU accelerator and the high-speed NVIDIA NVLink connection to the IBM POWER8 CPU.
For organizations running popular deep learning frameworks in production, this could mean faster training time for your current large data sets in an offering which is cost effective, flexible and available today. You can click here to learn more about and download deep-learning distributions on POWER.
To learn more about this server and order one, visit HPC on Power or reach out to your IBM representative or one of IBM’s Business Partners.
IBM also invites GPU software developers to join the IBM-NVIDIA Acceleration Lab. They can try out these systems and see the benefits of the Tesla P100 GPU accelerator and the high-speed NVLink connection to the IBM POWER8 CPU. This by-application program offers the chance to collaborate with IBM and NVIDIA GPU programmers to harness the power of NVIDIA Tesla GPUs with POWER8 for your GPU-enabled applications.
I look forward to hearing about the performance that you get from these systems. Share how you want to use this server and how you think NVIDIA NVLink can change application acceleration with us by posting in the comment section below.
 Based on AlexNet Training for Top-1 50% Accuracy. IBM Power S822LC for HPC configuration: 16 cores (8 cores/socket) at 3.25 GHz with 4x NVIDIA Tesla P100 GPUs; 512 GB memory; Ubuntu 16.04.1 running NVCaffe 0.14.5. IBM Power S822L configuration: 20 cores (10 cores/socket) at 3.694 GHz with 4xNVIDIA Tesla M40 GPUs; 512 GB memory; Ubuntu 16.04 running BVLC-Caffe f28f5ae2f2453f42b5824723efc326a04dd16d85. Software stack details for both configurations: G++ – 5.3.1, Gfortran – 5.3.1, OpenBlas – 0.2.18, Boost – 1.58.0, CUDA 8.0 Toolkit, Lapack – 3.6.0, Hdf5 – 1.8.16, Opencv – 2.4.9.