Setting up Ubuntu

Follow these steps to set up your system with Ubuntu.

Ubuntu operating system and repository setup

  1. Install packages needed for the installation
    sudo apt-get install -y wget nano apt-transport-https ca-certificates curl software-properties-common
  2. Load the latest kernel
    sudo apt-get install linux-headers-$(uname -r)
    sudo reboot
  3. Or do a full update
    sudo apt-get update
    sudo apt-get dist-upgrade
    sudo reboot

System firmware

If you are running on an AC922 system, you need to update the firmware. Ensure that the system firmware is updated to at least the following levels before you install the current NVIDIA GPU driver.

The firmware series and fix levels that are required for AC922 for the current NVIDIA GPU driver are:

  • 8335-GTG: OP910.30 or higher
  • 8335-GTH: OP920.10 or higher

System firmware updates are available at Fix Central. To find your updates in Fix Central, follow these steps:

  1. Enter 8335-GTG or 8335-GTH as the Product Selector.
  2. Select the appropriate firmware series from the drop-down list.
  3. Click Continue to go to the Select fixes page.
  4. Select the appropriate fix level.
  5. Click Continue to go to the Download options page.

Installing the GPU driver

Many of the deep learning packages require the GPU driver packages from NVIDIA. See the WML CE prerequisites for the required and recommended versions of the GPU driver.

Install the GPU driver by following these steps:

  1. Download the NVIDIA GPU driver.
    • Go to NVIDIA Driver Download.
    • Select Product Type: Tesla
    • Select Product Series: P-Series (for Tesla P100) or V-Series (for Tesla V100).
    • Select Product: Tesla P100 or Tesla V100.
    • Select Operating System: Linux POWER LE Ubuntu 18.04 . Click Show all Operating Systems if your version is not available.
    • Select CUDA Toolkit: 10.1
    • Click SEARCH to go to the download link.
    • Click Download to download the driver.
  2. The driver file name is NVIDIA-Linux-ppc64le-418.87.01.run. Give this file execute permission and execute it on the Linux image where the GPU driver is to be installed.

    When the file is executed, you are asked two questions. It is recommended that you answer Yes to both questions. If the driver fails to install, check the /var/log/nvidia-installer.log file for relevant error messages.

  3. Ensure the kernel headers are installed and match the running kernel. Compare the outputs of:
    RHEL
    dpkg -l | grep linux-headers kernel-devel kernel-headers
    and
    uname -r
    Ubuntu
    dpkg -l | grep linux-headers kernel-package kernel-headers
    and
    uname -r
    dpkg -l | grep linux-headers kernel-devel kernel-headers
    and
    uname -r
    Ensure that the linux-headers package version exactly match the version of the running kernel. If they are not identical, bring them in sync as appropriate:
    • Install missing packages.
    • Update downlevel packages.
    • Reboot the system if the packages are newer than the active kernel.
  4. Install the GPU driver repository and cuda-drivers:
    sudo dpkg -i nvidia-driver-local-repo-ubuntu1804-418.*.deb
    sudo apt-get update
    sudo apt-get install cuda-drivers
  5. Set nvidia-persistenced to start at boot
    sudo systemctl enable nvidia-persistenced
  6. Reboot the system

Installing docker, nvidia-docker2 on Ubuntu (optional)

If you plan to run containers on an Ubuntu host, use these steps in to install docker and nvidia-docker 2.

  1. Install docker.
    IBM Power
    sudo apt-get update
    sudo apt-get install apt-transport-https ca-certificates curl gnupg-agent software-properties-common
    curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
    sudo add-apt-repository "deb [arch=ppc64el] https://download.docker.com/linux/ubuntu bionic stable"
    sudo apt-get update
    sudo apt-get install docker-ce=18.06.1~ce~3-0~ubuntu
    x86_64
    sudo apt-get update
    sudo apt-get install apt-transport-https ca-certificates curl gnupg-agent software-properties-common
    curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
    sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu bionic stable"
    sudo apt-get update
    sudo apt-get install docker-ce
  2. Install nvidia-docker 2.
    curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
    distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
    curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
    sudo apt-get update
    sudo apt-get install nvidia-docker2
    sudo systemctl restart docker.service
  3. For each userid that will run docker, add the userid to the docker group:
    sudo usermod -a -G docker <userid>

    Users must log out and log back in to pick up this group change.

  4. Verify the setup.
    IBM Power
    nvidia-docker run --rm nvidia/cuda-ppc64le nvidia-smi
    x86_64
    nvidia-docker run --rm nvidia/cuda nvidia-smi
Note:

The nvidia-docker run command must be used with docker-ce (in other words, an Ubuntu host) to leverage the GPUs from within a container.

Installing Mellanox drivers

In order to use Infiniband with IBM Distributed Deep Learning and SnapML, install the latest Mellanox Driver from the Mellanox IBM Systems and Storage page.

Installing Perl

In order to use Spectrum MPI with IBM Distributed Deep Learning and SnapML, Perl must be installed on the system. Install Perl using the following command:
sudo yum install perl