Technical Blog Post

Abstract

Deep Learning on OpenPOWER: Building TensorFlow on OpenPOWER Linux Systems

Body

The Machine Learning and Deep Learning project in IBM Systems is a broad effort to build a co-optimized stack of hardware and software to make IBM Power Systems the best platform to develop and deploy cognitive applications. As part of this project, IBM has developed new processors, systems, and a co-optimized software stack uniquely optimized for AI applications.

The first offerings for this new era of cognitive computing are our first server designed from ground up for cognitive computing with the S822LC, and the PowerAI distribution of AI tools and libraries for the Ubuntu and RedHat Linux operating systems. Most data scientists and AI practitioners building cognitive solutions ptrefer to use the pre-built, pre-optimized deep learning frameworks of the PowerAI distribution.

In addition to creating the binary distribution of DL frameworks, we have also been working with the Open Source community to enable the open source frameworks to be built directly from the repositories to enable Deep Learning users to harness the power of the OpenPOWER ecosystem. With the introduction of little-endian OpenPOWER Linux, installation of open source applications on Power has never been easier.

If you need to build TensorFlow from source, this blog provides instructions on building TensorFlow on (little-endian) OpenPOWER Linux distributions, such as Red Hat Enterprise Linux 7.1, SUSE Linux Enterprise Server 12, Ubuntu 14.04, and subsequent releases. While this blog describes building specific versions of Bazel and TensorFlow, the instructions may be adapted to build other versions of both tools.

To build TensorFlow, we start by building Bazel, the build tool used to build TensorFlow binaries.

Building Bazel

To build Bazel 0.2.0, you will need at a minimum the following Linux packages:

g++,
zip,
zlib1g-dev,
unzip,
openjdk-8-jdk,
protobuf

Most of these Linux packages can be installed with the system installer directly from your Linux distribution. Bazel requires OpenJDK and is not interoperable with other Java environments today,

If you are using Ubuntu 14.04, OpenJDK 1.8 is not available in the default Ubuntu 14.04 repository, and can be added with the following commands (administrator access is required):

$ sudo add-apt-repository ppa:openjdk-r/ppa
$ sudo add-apt-repository -y ppa:openjdk-r/ppa
$ sudo apt-get update
$ sudo apt-get install openjdk-8-jdk
$ sudo update-alternatives --config java
$ sudo update-alternatives --config java --set /usr/lib/jvm/java-8-openjdk-ppc64el/jre/bin/java
$ sudo update-alternatives --set java /usr/lib/jvm/java-8-openjdk-ppc64el/jre/bin/java
$ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-ppc64el

Make sure you have JDK1.8 configured properly and update the environment variables for JAVA_HOME, JRE_HOME, CLASS_PATH and PATH. You can do this by setting JAVA_HOME etc in your .bashrc

export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-ppc64el
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export PATH=${JAVA_HOME}/bin:$PATH

Bazel depends on recently introduced features of protobuf, so a new version of protobuf is being installed together with bazel. A copy of the new protobuf binary is included with the Bazel distribution. (The appendix describes how to build this binary from source, if you prefer that route.)

Start by downloading Bazel. This tutorial uses Bazel 0.2 because it is sufficient for building TensorFlow and has fewer dependences, so it is quicker to install. Check out Bazel from the repository (We currently use the IBM repository for Bazel on Power as we work with Google to integrate code enhancements to support Power into Google’s repository.):

$ git clone https://github.com/ibmsoe/bazel
$ cd bazel
$ git checkout v0.2.0-ppc

Build bazel with the following command:

$ ./compile.sh

Building TensorFlow with CUDA 7.5

If CUDA 7.5 is already installed on your OpenPOWER system, you can skip to the next step, installing cuDNN 5.1. To install CUDA 7.5, download the CUDA distribution from
https://developer.nvidia.com/cuda-downloads and follow the installation instructions.

For example, on Ubuntu 14.04, this is performed as follows:
Download and install NVIDIA CUDA 7.5 from https://developer.nvidia.com/cuda-downloads
• Select Operating System: Linux
• Select Architecture: ppc64le
• Select Distribution Ubuntu
• Select Version 14.04
• Select the Installer Type that best fits your needs
• Follow the Linux installation instructions in the CUDA Quick Start Guide linked from the download page, including the steps describing how to set up the CUDA development environment by updating PATH and LD_LIBRARY_PATH.

If cuDNN v5.1 is already installed on your OpenPOWER system, you can skip to the next step, building TensorFlow. Download NVIDIA cuDNN 5.1 for CUDA 7.5 Power8 from https://developer.nvidia.com/cudnn and follow Nvidia’s installation instructions. Registration in NVIDIA’s Accelerated Computing Developer Program is required to download cuDNN.

On Ubuntu 14.04, download the following

cuDNN v5.1 Runtime Library for Power8(Deb)
cuDNN v5.1 Developer Library for Power8(Deb)
cuDNN v5.1 Code Samples and User Guide Power8(Deb)

and install as follows:

$ sudo dpkg -i libcudnn5*deb

Building TensorFlow with CUDA8

If CUDA 8 is already installed on your OpenPOWER system, you can skip to the next step, installing cuDNN 5.1. To install CUDA 8, download the CUDA distribution from

https://developer.nvidia.com/cuda-downloads and follow the installation instructions.

For example, on Ubuntu 16.04, this is performed as follows:

Download and install NVIDIA CUDA 8 from https://developer.nvidia.com/cuda-downloads

• Select Operating System: Linux

• Select Architecture: ppc64le

• Select Distribution Ubuntu

• Select Version 16.04

• Select the Installer Type that best fits your needs

• Follow the Linux installation instructions in the CUDA Quick Start Guide linked from the download page, including the steps describing how to set up the CUDA development environment by updating PATH and LD_LIBRARY_PATH.

If cuDNN v5.1 is already installed on your OpenPOWER system, you can skip to the next step, building TensorFlow. Download NVIDIA cuDNN 5.1 for CUDA 8 for Ubuntu 14.04 Power8 or Ubuntu 16.04 Power8 from https://developer.nvidia.com/cudnn and follow Nvidia’s installation instructions. Registration in NVIDIA’s Accelerated Computing Developer Program is required to download cuDNN.

On Ubuntu 14.04 or 16.04, download the following

cuDNN v5.1 Runtime Library for Ubuntu 14.04/16.04 Power8(Deb)
cuDNN v5.1 Developer Library for 14.04/16.04 Power8(Deb)
cuDNN v5.1 Code Samples and User Guide 14.04/16.04 Power8(Deb)

and install as follows:

$ sudo dpkg -i libcudnn5*deb

Building and Installing TensorFlow

IMPORTANT! If you are building TensorFlow on Ubuntu 16.04, you need to install a fix for a Linux system bug that may cause TensorFlow to fail. For information about this fix and installation instructions, please refer to the following Ubuntu defect https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/1642390

To start the TensorFlow build process, check the TensorFlow source code from the repository.

$ git clone --recurse-submodules https://github.com/tensorflow/tensorflow
$ cd tensorflow
$ git checkout v0.10.0rc0

The following code updates are used to build on OpenPOWER. They will be included in future distributions, but must be “chery picked” (explicitly included) for v0.10
$ git cherry-pick ce70f6cf842a46296119337247c24d307e279fa0 # Needed for compilation on PPC
$ git cherry-pick f1acb3bd828a73b15670fc8019f06a5cd51bd564 # Have a performance fix
$ git cherry-pick 9b6215a691a2eebaadb8253bd0cf706f2309a0b8 # Improve performance by detecting number of cores

Run the autoconfiguration script. CUDA 7.5 is not compatible with GNU Compiler Collection (GCC) versions 5.0 or higher, so be sure to have a suitable version of GCC in your PATH:
$. ./configure

Now you’re ready to start the build process. The sample command instructions bazel to build TensorFlow with GPU support using the –config=cuda option:
$ bazel build -c opt --config=cuda //tensorflow/tools/pip_package:build_pip_package

You may encounter an error building one of the dependencies named "farmhash" in a path which looks similar to the following (the blue hexadecimal numbers will vary for each full build): ~/.cache/bazel/_bazel_mkg/140d45dd76a7fbb1e5e221b6ccd2ae98/tensorflow/external/farmhash_archive/farmhash-34c13ddfab0e35422f4c3979f360635a8c050260

To resolve this, edit the farmhash config.guess configuration file ~/.cache/bazel/_bazel_mkg/*/tensorflow/external/farmhash_archive/farmhash-*/config.guess
to include a case for ppc64le as follows (around line 1508):

ppc64le:Linux:*:*)
echo powerpc64le-unknown-linux-gnu
exit ;;

Then, restart the build process:
$ bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg

Use the following command to install TensorFlow
$ sudo pip install /tmp/tensorflow_pkg/tensorflow*.whl

Appendix: Building Protobuf

To build and install a Protobuf (v3.0.0-beta-2), use the following commands:

$ git clone https://github.com/google/protobuf
$ cd protobuf
$ git checkout -b v3.0.0-beta-2
$ git cherry-pick 1760feb621a913189b90fe8595fffb74bce84598
$ ./autogen.sh
$ LDFLAGS="-static" ./configure --prefix=/opt/DL/protobuf && \
sed -i "s/^LDFLAGS = -static/LDFLAGS = -all-static/g" src/Makefile && \
make && \
make install

See what you can do with TensorFlow on OpenPOWER

A variety of GPU numerical accelerator configurations can be used to accelerate TensorFlow on OpenPOWER systems, including the recently announced new IBM Power Systems S822LC for High Performance Computing server. You can learn more about and order these systems by contacting your IBM Business Partner.

IBM invites GPU software developers to join the IBM-NVIDIA Acceleration Lab to be among the first to try these systems and see the benefits of the Tesla P100 GPU accelerator and the high-speed NVLink connection to the IBM POWER8 CPU.

I look forward to hearing about the performance you get from these systems. Share how you want to use TensorFlow on OpenPOWER and how Deep Learning on OpenPOWER will enable you to build the next generation of cognitive applications by posting in the comments section below.

Dr. Michael Gschwind is Chief Engineer for Machine Learning and Deep Learning for IBM Systems where he leads the development of hardware/software integrated products for cognitive computing. During his career, Dr. Gschwind has been a technical leader for IBM’s key transformational initiatives, leading the development of the OpenPOWER Hardware Architecture as well as the software interfaces of the OpenPOWER Software Ecosystem. In previous assignments, he was a chief architect for Blue Gene, POWER8, POWER7, and Cell BE. Dr. Gschwind is a Fellow of the IEEE, an IBM Master Inventor and a Member of the IBM Academy of Technology.

[{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"HW1W1","label":"Power ->PowerLinux"},"Component":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"","Edition":"","Line of Business":{"code":"","label":""}}]

UID

ibm16169869

IBM Support

Tips