“Chainer” (http://chainer.org) is one of the most important Deep Learning platforms. Chainer is a flexible framework provided as a python library, and supporting CUDA and multi-GPU capability.
IBM Power machines, such as S822LC for HPC, are a best choice to run Chainer and other Deep Learning platforms with multi-GPU capability, because they are designed for cognitive computing.
This blog provides instructions on building Chainer on (little-endian) OpenPOWER Linux distributions, such as Ubuntu 16.04, Red Hat Enterprise Linux 7.1, and subsequent releases.
Chainer requires the following versions of python (See https://github.com/pfnet/chainer/blob/master/README.md).
- Python 2.7.6+, 3.4.3+, 3,5.1+
Please check the python version on your system by “python –V” command. If the version is not supported, please install or update python as follows.
$ sudo apt-get install python
$ sudo apt-get upgrade python
$ sudo yum install python
$ sudo yum update python
Because Chainer is provided as a python library, you need “pip” command, a package management system for python. If pip is not installed on your system, please install pip as follows.
$ sudo apt-get install python-pip
$ sudo pip install --upgrade pip
On RHEL, you need to subscribe the EPEL (Extra Package for Enterprise Linux) repository for yum at first, and install it as follows.
$ sudo yum install epel-release
$ sudo yum install python-pip
$ sudo pip install --upgrade pip
Install pre-required software
Install pre-required software as follows.
$ sudo apt-get install git gcc make openssl libssl-dev libbz2-dev libreadline-dev libsqlite3-dev
$ sudo yum install git gcc make openssl-devel bzip2-devel readline-devel sqlite-devel
Install HDF5 library as follows.
$ sudo apt-get install libhdf5-serial-dev libhdf5-mpich-dev libhdf5-openmpi-dev
On RHEL, please download and build the HDF5 source code, as follows.
$ wget http://www.hdfgroup.org/ftp/HDF5/current/src/hdf5-1.8.17.tar.gz
$ tar xvfz hdf5-1.8.17.tar.gz
$ cd hdf5-1.8.17
$ ./configure [--prefix=<Install directory>] --enable-fortran --enable-cxx \
--build=powerpc64le-linux-gnu # specify “–prefix” option if necessary.
$ make install
If CUDA 8 is already installed on your OpenPOWER system, you can skip to the next step, installing cuDNN 5.1. To install CUDA 8, download the CUDA distribution from
https://developer.nvidia.com/cuda-downloads and follow the installation instructions.
For example, on Ubuntu 16.04, this is performed as follows:
Download and install NVIDIA CUDA 8 from https://developer.nvidia.com/cuda-downloads
• Select Operating System: Linux
• Select Architecture: ppc64le
• Select Distribution Ubuntu
• Select Version 16.04
• Select the Installer Type that best fits your needs
• Follow the Linux installation instructions in the CUDA Quick Start Guide linked from the download page, including the steps describing how to set up the CUDA development environment by updating PATH and LD_LIBRARY_PATH depending on your environment.
Install cuDNN v5.1
If cuDNN v5.1 is already installed on your OpenPOWER system, you can skip to the next step, installing Chainer. Download NVIDIA cuDNN 5.1 for CUDA 8 on POWER8 from https://developer.nvidia.com/cudnn and follow NVIDIA’s installation instructions. Registration in NVIDIA’s Accelerated Computing Developer Program is required to download cuDNN.
Then add a directory containing cudnn.h to CFLAGS, and add a directory containing libcudnn.so to LDFLAGS and LD_LIBRARY_PATH.
Chainer installer investigates your environments, such as CUDA path, while the installation. Please finish the previous steps before installing Chainer.
You can install Chainer by the following command.
$ sudo pip install chainer --no-cache-dir
If it fails, please check https://github.com/pfnet/chainer#installation.
Run an example code in Chainer
Please run an example code in order to check if Chainer is correctly installed with CUDA and cuDNN.
Download the source code as follows.
$ mkdir -p ~/src
$ cd ~/src
$ git clone https://github.com/pfnet/chainer.git
Run an example code.
$ cd chainer/examples/mnist
$ python train_mnist.py --gpu 0
# unit: 1000
# Minibatch-size: 100
# epoch: 20
epoch main/loss validation/main/loss main/accuracy validation/main/accuracy
1 0.194729 0.0925221 0.940083 0.972
20 0.00914637 0.101582 0.997449 0.984
If you do not see warning or error messages, Chainer is correctly installed with CUDA and cuDNN. Otherwise, please check https://github.com/pfnet/chainer#installation.
See what you can do with Chainer on OpenPOWER
A variety of GPU numerical accelerator configurations can be used to accelerate Chainer on OpenPOWER systems, including the recently announced new IBM Power Systems S822LC for High Performance Computing server. You can learn more about and order these systems by contacting your IBM Business Partner.
IBM invites GPU software developers to join the IBM-NVIDIA Acceleration Lab to be among the first to try these systems and see the benefits of the Tesla P100 GPU accelerator and the high-speed NVLink connection to the IBM POWER8 CPU.
I look forward to hearing about the performance you get from these systems. Share how you want to use Chainer on OpenPOWER and how Deep Learning on OpenPOWER will enable you to build the next generation of cognitive applications by posting in the comments section below.
Yasushi Negishi is a Research Staff Member at IBM Research - Tokyo. He belongs to the Deep Computing & Analytics group in Systems & Software. He joined IBM Research - Tokyo in 1989 after obtaining his M.S. degree in information science from the Tokyo Institute of Technology. He has more than 25 years of research experience. In 1989-1990, he researched system software, such as two-level threading systems, on the world’s second earliest symmetric multiprocessing machine called TOP-1. In 1990-1996, he improved the performance of the NFS server by about 15% by avoiding target data copy with functions of Ethernet adapters. In 1995-1996, he developed a communication protocol and system for video-on-demand software based on UDP/IP that achieved several times better communication stability than TCP/IP. In 1995-1999, he developed a communication protocol and system for PDAs (personal digital assistants) based on a synchronization mechanism. His protocol and system were used for a product providing the Lotus Notes database on PDAs. In 2000-2004, he worked on the design and development of a high performance processor for gaming, named CELL, in collaboration with Sony and Toshiba technicians. From 2008, he optimized HPC applications, such as FFT and CFD, on Blue Gene and other POWER processor machines. He also developed programming tools for optimization. Much of his work has been presented at major refereed conferences and in journals including INFOCOM, SC, Euro-Par, IPDPS, and TPDS. He is an ACM Senior Member, and a member of IPSJ and IEEE.