If you’ve ever had the pleasure of training a neural network you’ll know it’s not much fun sitting there and watching things “training”. While waiting for your model to converge is an ideal time to go and get yourself [another] coffee wouldn’t it be better if there was a faster way of doing things?
Choosing the right infrastructure for your ML/DL workloads is really important and can make a huge difference to your time to deployment and ultimately we’re talking about delivering a competitive advantage sooner which is an important goal for all organisations..
It’s not just a case of picking the right place to run your workload though - what about the software stack? Everyone has an opinion on which framework to use, be it TensorFlow or Caffe but installing them, and more importantly all their dependancies, can be a nightmare.
IBM’s PowerAI offering provides a single install for optimised, tuned and enhanced versions of the most popular Machine Learning frameworks - and in this edition of the blog I’m going to run through how to install and get started with PowerAI to reduce your training times today!
You will need:
1 x IBM S822LC for HPC (Minsky) with Ubuntu 16.04 installed
1 x Internet connection
1 x 1:30 mins to go through the installation
Downloading the required files
In order to install PowerAI you’ll need copies of all the following files..
- 7fa2af80.pub - This GPG Key you get from accepting the Nvidia licence agreement is required to install the Nvidia CUDA libraries.
- cuda-repo-ubuntu1604-8-0-local-ga2v2_8.0.61-1_ppc64el.deb - The local install file for the Nvidia CUDA API for GPU development. This is available directly from Nvidia and provides a collection of libraries and functions for GPU development needed for the PowerAI frameworks.
- libcudnn6* - Nvidia’s cuDNN is a library of GPU accelerated primitives specifically for Deep Neural Network development.
- nvidia-driver-local-repo-ubuntu1604-384.81_1.0-1_ppc64el.deb - The latest Nvidia GPU driver.
- mldl-repo-local_4.0.0_ppc64el.deb - the PowerAI install file which contains prebuilt versions of all your favourite AI frameworks and their dependancies.
Note - While Nvidia provide a network install of the cuda repo I’d recommend using the local version. The network installer automatically updates the CUDA version which can cause a mismatch with the driver and cause things to stop working.
Installing the Nvidia components
Before we install the PowerAI bundle itself - let’s make sure our system is able to make the most of the available P100 GPUs by installing the required Nvidia software/drivers.
Add the GPG key for the CUDA repository with the following command:
Next use the Debian package manager (dpkg) to install CUDA:
Run an apt-get update to pick up the changes:
To complete the CUDA installation export the following paths to ensure the CUDA binaries are available to your system:
Optionally - add the above lines to /etc/bash.bashrc to globally set these variables on boot.
Next install the cuDNN library using dpkg:
When you install the CUDA toolchain you’ll get a version of the Nvidia driver too, but it’s not the latest. It’s a good idea to make sure you’re on the most up-to-date version of the Nvidia drivers for performance/support reasons.
First use dpkg to unpack the driver install file:
Run another apt-get update to pick up the driver..
Finally use apt-get install to install the driver itself.
That’s it - we’ve installed all the Nvidia components needed to run PowerAI. We just need to reboot the system to allow the changes to take effect.
Once the system comes back up we should be able to run nvidia-smi to see the status of the GPUs:
Now we’ve got the system up and running with all of the Nvidia tools required to make the most of all our GPUs we can install the PowerAI bundle.
To begin - use dpkg to extract the PowerAI installer:
Then another apt-get update to detect the change:
Finally apt-get install power-mldl will install all of the frameworks bundled in PowerAI:
It’s that easy! That’s everything you need to do to install PowerAI and the Nvidia tools on a Minsky system. In order to check what’s been installed list the contents of /opt/DL:
A full list of what’s included is available here..
If you’re really looking to get the most from the system and PowerAI there are a few performance tweaks you can do.
Firstly set the optimum SMT level for TensorFlow:
Ensure Nvidia driver persistence mode is enabled. This prevents the driver from “hanging up” during execution which removes any latency waiting for driver initialisation between epochs:
Also ensure the GPU clocks are set to the optimum value:
As with other Machine Learning environments before we can use the framework we must first source it so that our runtime knows the library is available. This can be done on a case by case basis through the shell, persisted globally in /etc/bash.basrc or on a per user basis in ~/.bashrc.
All of the frameworks in PowerAI come with a simple test you can execute to ensure that everything is in working order. In order to run the test, first ensure you’ve sourced the framework (as above) then execute: