PowerAI system setup

Find information to set up your operating system, repository, and NVIDIA components.

Note: For AC922 systems that will use the latest NVIDIA GPU driver: The GPU driver requires other updates that must be installed in a specific order:
  1. Latest Linux kernel for RHEL 7.5 ALT
  2. Recent AC922 system firmware:
    • 8335-GTG: OP910.24
    • 8335-GTH: OP920.02
  3. NVIDIA GPU driver 396.44 or higher

Operating system

The Deep Learning packages require Red Hat Enterprise Linux (RHEL) 7.5 little endian for IBM® POWER8® and IBM POWER9™. The RHEL installation image and license must be acquired from Red Hat: https://www.redhat.com/en/technologies/linux-platforms/enterprise-linux

For more information about installing operating systems on IBM Power Systems servers, see Quick start guides for Linux on IBM® Power System servers.

Operating system and repository setup

  1. Enable optional and extra repo channels.

    IBM POWER8:

    sudo subscription-manager repos --enable=rhel-7-for-power-le-optional-rpms
    sudo subscription-manager repos --enable=rhel-7-for-power-le-extras-rpms

    IBM POWER9:

    sudo subscription-manager repos --enable=rhel-7-for-power-9-optional-rpms
    sudo subscription-manager repos --enable=rhel-7-for-power-9-extras-rpms
  2. Install packages needed for the installation.
    sudo yum -y install wget nano bzip2
  3. Enable EPEL repo.
    wget https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
    sudo rpm -ihv epel-release-latest-7.noarch.rpm
  4. Load the latest kernel,
    sudo yum update kernel kernel-tools kernel-tools-libs kernel-bootwrapper
    reboot

    Or do a full update.

    sudo yum update
    sudo reboot
Important: RHEL 7.6 was released at the end of October, but is not yet supported by PowerAI. Running just yum update might upgrade a 7.5 system to 7.6. In order to avoid this, customers with a standard RHEL subscription might use:
sudo subscription-manager release --set=7.5
Customers should consult Red Hat if they’re unsure how to avoid unintended upgrade.

System firmware

If you are running on an AC922 system, you need to update the firmware. Ensure that the system firmware is updated to at least the following levels before you install the current NVIDIA GPU driver.

The firmware series / fix levels that are required for AC922 for the current NVIDIA GPU driver are:

  • 8335-GTG: OP910.24
  • 8335-GTH: OP920.02

System firmware updates are available at https://www.ibm.com/support/fixcentral/. To find your updates, follow these steps:

  1. Enter "8335-GTG" or "8335-GTH" as the Product Selector.
  2. Select the appropriate firmware series from the drop-down list.
  3. Click Continue to go to the Select fixes page.
  4. Select the appropriate fix level.
  5. Click Continue to go to the Download options page.

NVIDIA Components: IBM POWER9 specific udev rules

Before you install the NVIDIA components, the udev Memory Auto-Onlining Rule must be disabled for the CUDA driver to function properly. To disable it, follow these steps:

  1. Copy the /lib/udev/rules.d/40-redhat.rules file to the directory for user overridden rules.
    sudo cp /lib/udev/rules.d/40-redhat.rules /etc/udev/rules.d/
  2. Edit the /etc/udev/rules.d/40-redhat.rules file.
    sudo nano /etc/udev/rules.d/40-redhat.rules
  3. Comment out the following line and save the change:
    SUBSYSTEM=="memory", ACTION=="add", PROGRAM="/bin/uname -p", 
    RESULT!="s390*", ATTR{state}=="offline", ATTR{state}="online"
  4. Optionally, delete the first line of the file, since the file was copied to a directory where it cannot be overwritten.
      # do not edit this file, it will be overwritten on update
  5. Restart the system for the changes to take effect.
    sudo reboot

CUDA, GPU driver, cuDNN, and NCCL

The Deep Learning packages require CUDA, cuDNN, and GPU driver packages from NVIDIA. See the table above for the required and recommended versions of these components.

Install the components by following these steps:

  1. Download and install NVIDIA CUDA 9.2.148 from https://developer.nvidia.com/cuda-92-download-archive
    • Select Operating System:Linux.
    • Select Architecture:ppc64le.
    • Select Distribution:RHEL.
    • Select Version:7.
    • Select Installer Type: rpm (local).
    • Follow the Linux on POWER installation instructions in the CUDA Quick Start Guide, including the steps that describe how to set up the CUDA development environment by updating PATH and LD_LIBRARY_PATH.
    Note: The local rpm is preferred over the network rpm as it ensures the version that is installed is the version that is downloaded. With the network rpm, yum install cuda always installs the latest version of the CUDA Toolkit.
  2. Download Patch 1 for cuBLAS and CUPTI. The patch is available at the same location as the base CUDA package. On CUDA download page, scroll down to find the download link for ".Patch 1 (Released Aug 6, 2018)"
  3. Download NVIDIA driver 396.44 from http://www.nvidia.com/Download/index.aspx
    • Select Product Type:Tesla
    • Select Product Series:P-Series
    • Select Product:Tesla P100
    • Select Operating System:Linux POWER LE RHEL 7
    • Select CUDA Toolkit:9.2
    • Click Search to go do the download link.
  4. Install CUDA and the GPU driver.
    Note: For AC922 systems: OS and system firmware updates are required before you install the latest GPU driver.
    At a high level, the installation process is:
    • Install the CUDA Base repository rpm.
    • Install the CUDA Patch 1 repository rpm.
    • Install the GPU driver repository rpm.
    • Run $ sudo yum install cuda to install CUDA, patch, and GPU driver.
    • Restart to activate the driver.
    For more information, see the Linux POWER® installation instructions in the CUDA Quick Start Guide. It includes steps for setting up the CUDA development environment by updating PATH and LD_LIBRARY_PATH.
  5. Download NVIDIA cuDNN v7.2.1 for CUDA 9.2 from https://developer.nvidia.com/cudnn (Registration in NVIDIA’s Accelerated Computing Developer Program is required).
    • cuDNN v7.2.1 Library for Linux (Power8/Power9)
  6. Download NVIDIA NCCL v2.2.13 for CUDA 9.2 from https://developer.nvidia.com/nccl (Registration in NVIDIA’s Accelerated Computing Developer Program is required).
    • NCCL 2.2.13 O/S agnostic and CUDA 9.2 and IBM Power®
  7. Install the cuDNN v7.2.1 and NCCL v2.2.13 packages. Refresh shared library cache.
    sudo tar -C /usr/local --no-same-owner -xzvf cudnn-9.2-linux-ppc64le-v7.2.1.38.tgz
    sudo tar -C /usr/local --no-same-owner -xzvf nccl_2.2.13-1+cuda9.2_ppc64le.tgz
    sudo ldconfig

Anaconda

A number of the Deep Learning frameworks require Anaconda. Anaconda is a platform-agnostic data science distribution with a collection of 1,000+ open source packages with free community support.

Anaconda2 with Python 2 should be used to run the Python 2 versions of the Deep Learning frameworks. Anaconda3 with Python 3 is required to run the Python 3 versions of the Deep Learning frameworks.

Anaconda Version Download Location. Size md5sum
Anaconda2 5.2.0 https://repo.continuum.io/archive/Anaconda2-5.2.0-Linux-ppc64le.sh 270 M 479633a95906ea6d41056ebe84a4c47b
Anaconda3 5.2.0 https://repo.continuum.io/archive/Anaconda3-5.2.0-Linux-ppc64le.sh 288 M cbd1d5435ead2b0b97dba5b3cf45d694

Download and Install Anaconda. Installation requires input for license agreement, install location (default is $HOME/anaconda2) and permission to modify the PATH environment variable (by using .bashrc).

wget https://repo.continuum.io/archive/Anaconda2-5.2.0-Linux-ppc64le.sh
bash Anaconda2-5.2.0-Linux-ppc64le.sh
source ~/.bashrc

If multiple users are using the same system, each user should install Anaconda individually.