Power servers

Deep-learning training in under an hour

Share this post:


What do you get when you combine NVIDIA’s most advanced data center accelerator with what we think is the best CPU for big data, in a server that was built to unleash their combined performance for deep learning and HPC applications?

Two weeks ago we announced the world’s first server built with NVIDIA’s new Tesla P100 GPU accelerators, connected with the high-speed NVIDIA NVLink interface to IBM’s POWER8 processor.  Hundreds of these servers, called the IBM Power System 822LC for HPC, have already been shipped worldwide as part of our early shipment program for key high-performance computing (HPC) and high-performance data analytics (HPDA) applications and workloads.

Today I am excited to share with you a new addition to the fast-growing world of deep learning highlighting the capabilities of the POWER8 CPU, NVIDIA Tesla P100 GPU and NVIDIA NVLink.

4-Tesla P100 GPU Server is 2.2 times faster than the 4-Tesla M40 GPU Server[1]

Deep learning training speed measures how quickly and efficiently a deep neural network can be trained to identify and categorize information within a particular learning set.  Deep learning software frameworks scale well with GPU accelerators and system bandwidth. The combination of NVIDIA Tesla P100 GPUs connected to each other and the POWER8 CPU by NVIDIA NVLink makes the new IBM S822LC for HPC a very powerful platform for deep-learning training.

Last week, our deep learning performance team reached a new milestone. When I last wrote about this workload three weeks ago, we had successfully trained AlexNet, a common benchmark for deep-learning training, to 50 percent accuracy in 1 hour, 44 minutes. This week, we’re seeing results that have reduced the training time in the same test to 57 minutes; under an hour! This training data was collected with the latest system firmware along with system and software optimizations to tune overall S822LC system performance. As we continue to better understand the capabilities of the S822LC for HPC, we’ll continue to share progress on how we are fine-tuning the system for today’s deep-learning demands.

alexnet training curve

Figure 1: Curve of accuracy increased in AlexNet Training by time elapsed

We’re tremendously excited by this result and by what it reflects for IBM’s investment in this quickly-evolving segment.  It gets even better when we compare results in the same test to the performance of the previous generation of deep-learning systems built around the PCIe-attached NVIDIA Tesla.

Figure 2Performance in Minutes of Tesla M40/PCIe system vs Tesla P100/NVLink System

Figure 2: Performance in Minutes of Tesla M40/PCIe system vs Tesla P100/NVLink System

A single S822LC for HPC with four NVIDIA Tesla P100 GPUs is 2.2 times faster reaching 50 percent accuracy in AlexNet than a server with four NVIDIA Tesla M40 GPUs!1

Get started with a POWER8-Pascal Server today

If you work on GPU-accelerated deep learning and HPC applications, take this opportunity to try out these new systems and enjoy the benefits of both the Tesla P100 GPU accelerator and the high-speed NVIDIA NVLink connection to the IBM POWER8 CPU.

For organizations running popular deep learning frameworks in production, this could mean faster training time for your current large data sets in an offering which is cost effective, flexible and available today. You can click here to learn more about and download deep-learning distributions on POWER.

To learn more about this server and order one, visit HPC on Power or reach out to your IBM representative or one of IBM’s Business Partners.

IBM also invites GPU software developers to join the IBM-NVIDIA Acceleration Lab. They can try out these systems and see the benefits of the Tesla P100 GPU accelerator and the high-speed NVLink connection to the IBM POWER8 CPU. This by-application program offers the chance to collaborate with IBM and NVIDIA GPU programmers to harness the power of NVIDIA Tesla GPUs with POWER8 for your GPU-enabled applications.

I look forward to hearing about the performance that you get from these systems.  Share how you want to use this server and how you think NVIDIA NVLink can change application acceleration with us by posting in the comment section below.

[1] Based on AlexNet Training for Top-1 50% Accuracy.  IBM Power S822LC for HPC configuration: 16 cores (8 cores/socket) at 3.25 GHz with 4x NVIDIA Tesla P100 GPUs; 512 GB memory; Ubuntu 16.04.1 running NVCaffe 0.14.5.  IBM Power S822L configuration: 20 cores (10 cores/socket) at 3.694 GHz with 4xNVIDIA Tesla M40 GPUs; 512 GB memory; Ubuntu 16.04 running BVLC-Caffe f28f5ae2f2453f42b5824723efc326a04dd16d85.  Software stack details for both configurations:  G++ – 5.3.1, Gfortran – 5.3.1, OpenBlas –  0.2.18, Boost – 1.58.0, CUDA 8.0 Toolkit, Lapack – 3.6.0, Hdf5 – 1.8.16, Opencv – 2.4.9.

More Power servers stories

Top IBM Power Systems myths: “IBM AIX is dead and Unix isn’t relevant in today’s market” (part 2)

AI, IBM Systems Lab Services, Power Systems

In part 1 of this series, we started to look at the myth that IBM AIX and Unix are no longer relevant. We talked about the Unix wars that began in the 1980s and how the market has evolved since then. Now, let’s consider the evolution of AIX in the past few decades and the ...read more

IBM i – The driverless variant of IT infrastructure

Big data & analytics, Modern data platforms, Power servers

I am, quite frankly, not looking forward to the advent of autonomous automobiles. I happily drive a sporty, stick-shift vehicle myself. But what does intrigue me is autonomous IT infrastructure. I keep track of developments in autonomous IT and consider it to be an early-stage trend that will not reach maturity for several more years. ...read more

Seven ways IBM PowerVC can make IT operations more nimble

AI, IBM Systems Lab Services, Power Systems

Every day, organizations face the task of managing their IBM Power Systems infrastructure and virtualization. Operations teams always have to be on their toes to keep up with the ever-increasing demands of logical partition (LPAR) deployments, decommissions, storage volume management, SAN zoning, managing standardized OS image catalogues and whatnot. But how can you manage these ...read more