PowerAI system setup
Find information to set up your operating system, repository, and NVIDIA components.
- Latest Linux kernel for RHEL 7.5 ALT
- Recent AC922 system firmware:
- 8335-GTG: OP910.24
- 8335-GTH: OP920.02
- NVIDIA GPU driver 396.44 or higher
Operating system
The Deep Learning packages require Red Hat Enterprise Linux (RHEL) 7.5 little endian for IBM® POWER8® and IBM POWER9™. The RHEL installation image and license must be acquired from Red Hat: https://www.redhat.com/en/technologies/linux-platforms/enterprise-linux
For more information about installing operating systems on IBM Power Systems servers, see Quick start guides for Linux on IBM® Power System servers.
Operating system and repository setup
- Enable
optional
andextra
repo channels.IBM POWER8:
sudo subscription-manager repos --enable=rhel-7-for-power-le-optional-rpms
sudo subscription-manager repos --enable=rhel-7-for-power-le-extras-rpms
IBM POWER9:
sudo subscription-manager repos --enable=rhel-7-for-power-9-optional-rpms
sudo subscription-manager repos --enable=rhel-7-for-power-9-extras-rpms
- Install packages needed for the
installation.
sudo yum -y install wget nano bzip2
- Enable EPEL
repo.
wget https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
sudo rpm -ihv epel-release-latest-7.noarch.rpm
- Load the latest
kernel,
sudo yum update kernel kernel-tools kernel-tools-libs kernel-bootwrapper
reboot
Or do a full update.
sudo yum update
sudo reboot
sudo subscription-manager release --set=7.5
Customers should consult Red Hat
if they’re unsure how to avoid unintended upgrade.System firmware
If you are running on an AC922 system, you need to update the firmware. Ensure that the system firmware is updated to at least the following levels before you install the current NVIDIA GPU driver.
The firmware series / fix levels that are required for AC922 for the current NVIDIA GPU driver are:
- 8335-GTG: OP910.24
- 8335-GTH: OP920.02
System firmware updates are available at https://www.ibm.com/support/fixcentral/. To find your updates, follow these steps:
- Enter "8335-GTG" or "8335-GTH" as the Product Selector.
- Select the appropriate firmware series from the drop-down list.
- Click Continue to go to the Select fixes page.
- Select the appropriate fix level.
- Click Continue to go to the Download options page.
NVIDIA Components: IBM POWER9 specific udev rules
Before you install the NVIDIA components, the udev Memory Auto-Onlining Rule must be disabled for the CUDA driver to function properly. To disable it, follow these steps:
- Copy the
/lib/udev/rules.d/40-redhat.rules
file to the directory for user overridden rules.sudo cp /lib/udev/rules.d/40-redhat.rules /etc/udev/rules.d/
- Edit the
/etc/udev/rules.d/40-redhat.rules
file.sudo nano /etc/udev/rules.d/40-redhat.rules
- Comment out the following line and save the change:
SUBSYSTEM=="memory", ACTION=="add", PROGRAM="/bin/uname -p", RESULT!="s390*", ATTR{state}=="offline", ATTR{state}="online"
- Optionally, delete the first line of the file, since the file was copied to a directory where it
cannot be overwritten.
# do not edit this file, it will be overwritten on update
- Restart the system for the changes to take effect.
sudo reboot
CUDA, GPU driver, cuDNN, and NCCL
The Deep Learning packages require CUDA, cuDNN, and GPU driver packages from NVIDIA. See the table above for the required and recommended versions of these components.
Install the components by following these steps:
- Download and install NVIDIA CUDA 9.2.148 from https://developer.nvidia.com/cuda-92-download-archive
- Select Operating System:Linux.
- Select Architecture:ppc64le.
- Select Distribution:RHEL.
- Select Version:7.
- Select Installer Type: rpm (local).
- Follow the Linux on POWER installation instructions in the CUDA Quick Start Guide, including the steps that describe how to set up the CUDA
development environment by updating
PATH
andLD_LIBRARY_PATH
.
Note: The local rpm is preferred over the network rpm as it ensures the version that is installed is the version that is downloaded. With the network rpm,yum install cuda
always installs the latest version of the CUDA Toolkit. - Download Patch 1 for cuBLAS and CUPTI. The patch is available at the same location as the base CUDA package. On CUDA download page, scroll down to find the download link for ".Patch 1 (Released Aug 6, 2018)"
- Download NVIDIA driver 396.44 from http://www.nvidia.com/Download/index.aspx
- Select Product Type:Tesla
- Select Product Series:P-Series
- Select Product:Tesla P100
- Select Operating System:Linux POWER LE RHEL 7
- Select CUDA Toolkit:9.2
- Click Search to go do the download link.
- Install CUDA and the GPU driver.Note: For AC922 systems: OS and system firmware updates are required before you install the latest GPU driver.At a high level, the installation process is:
- Install the CUDA Base repository rpm.
- Install the CUDA Patch 1 repository rpm.
- Install the GPU driver repository rpm.
- Run
$ sudo yum install cuda
to install CUDA, patch, and GPU driver. - Restart to activate the driver.
PATH
andLD_LIBRARY_PATH
. - Download NVIDIA cuDNN v7.2.1 for CUDA 9.2 from https://developer.nvidia.com/cudnn (Registration in NVIDIA’s Accelerated Computing Developer
Program is required).
- cuDNN v7.2.1 Library for Linux (Power8/Power9)
- Download NVIDIA NCCL v2.2.13 for CUDA 9.2 from https://developer.nvidia.com/nccl (Registration in NVIDIA’s Accelerated Computing Developer
Program is required).
- NCCL 2.2.13 O/S agnostic and CUDA 9.2 and IBM Power®
- Install the cuDNN v7.2.1 and NCCL v2.2.13 packages. Refresh shared library
cache.
sudo tar -C /usr/local --no-same-owner -xzvf cudnn-9.2-linux-ppc64le-v7.2.1.38.tgz
sudo tar -C /usr/local --no-same-owner -xzvf nccl_2.2.13-1+cuda9.2_ppc64le.tgz
sudo ldconfig
Anaconda
A number of the Deep Learning frameworks require Anaconda. Anaconda is a platform-agnostic data science distribution with a collection of 1,000+ open source packages with free community support.
Anaconda2 with Python 2 should be used to run the Python 2 versions of the Deep Learning frameworks. Anaconda3 with Python 3 is required to run the Python 3 versions of the Deep Learning frameworks.
Anaconda | Version | Download Location. | Size | md5sum |
---|---|---|---|---|
Anaconda2 | 5.2.0 | https://repo.continuum.io/archive/Anaconda2-5.2.0-Linux-ppc64le.sh | 270 M | 479633a95906ea6d41056ebe84a4c47b |
Anaconda3 | 5.2.0 | https://repo.continuum.io/archive/Anaconda3-5.2.0-Linux-ppc64le.sh | 288 M | cbd1d5435ead2b0b97dba5b3cf45d694 |
Download and Install Anaconda. Installation requires input for license agreement, install
location (default is $HOME/anaconda2
) and permission to modify the
PATH
environment variable (by using .bashrc
).
wget https://repo.continuum.io/archive/Anaconda2-5.2.0-Linux-ppc64le.sh
bash Anaconda2-5.2.0-Linux-ppc64le.sh
source ~/.bashrc
If multiple users are using the same system, each user should install Anaconda individually.