Setting up Ubuntu

Follow these steps to set up your system with Ubuntu.

Ubuntu operating system and repository setup
System firmware
Installing the GPU driver
Installing docker, nvidia-docker2 on Ubuntu (optional)
Installing Mellanox drivers
Installing Perl

Ubuntu operating system and repository setup

Install packages needed for the installation

sudo apt-get install -y wget nano apt-transport-https ca-certificates curl software-properties-common

Load the latest kernel

sudo apt-get install linux-headers-$(uname -r)
sudo reboot

Or do a full update

sudo apt-get update
sudo apt-get dist-upgrade
sudo reboot

System firmware

If you are running on an AC922 system, you need to update the firmware. Ensure that the system firmware is updated to at least the following levels before you install the current NVIDIA GPU driver.

The firmware series and fix levels that are required for AC922 for the current NVIDIA GPU driver are:

8335-GTG: OP910.30 or higher
8335-GTH: OP920.10 or higher

System firmware updates are available at Fix Central. To find your updates in Fix Central, follow these steps:

Enter 8335-GTG or 8335-GTH as the Product Selector.
Select the appropriate firmware series from the drop-down list.
Click Continue to go to the Select fixes page.
Select the appropriate fix level.
Click Continue to go to the Download options page.

Installing the GPU driver

Many of the deep learning packages require the GPU driver packages from NVIDIA. See the WML CE prerequisites for the required and recommended versions of the GPU driver.

Install the GPU driver by following these steps:

Download the NVIDIA GPU driver.
- Go to NVIDIA Driver Download.
- Select Product Type: Tesla
- Select Product Series: P-Series (for Tesla P100) or V-Series (for Tesla V100).
- Select Product: Tesla P100 or Tesla V100.
- Select Operating System: Linux POWER LE Ubuntu 18.04 . Click Show all Operating Systems if your version is not available.
- Select CUDA Toolkit: 10.1
- Click SEARCH to go to the download link.
- Click Download to download the driver.
The driver file name is NVIDIA-Linux-ppc64le-418.87.01.run. Give this file execute permission and execute it on the Linux image where the GPU driver is to be installed.
When the file is executed, you are asked two questions. It is recommended that you answer Yes to both questions. If the driver fails to install, check the /var/log/nvidia-installer.log file for relevant error messages.
Ensure the kernel headers are installed and match the running kernel. Compare the outputs of:
RHEL
```
dpkg -l | grep linux-headers kernel-devel kernel-headers
```
and
```
uname -r
```
Ubuntu
```
dpkg -l | grep linux-headers kernel-package kernel-headers
```
and
```
uname -r
```
```
dpkg -l | grep linux-headers kernel-devel kernel-headers
```
and
```
uname -r
```
Ensure that the linux-headers package version exactly match the version of the running kernel. If they are not identical, bring them in sync as appropriate:
- Install missing packages.
- Update downlevel packages.
- Reboot the system if the packages are newer than the active kernel.

Install the GPU driver repository and cuda-drivers:

sudo dpkg -i nvidia-driver-local-repo-ubuntu1804-418.*.deb

sudo apt-get update

sudo apt-get install cuda-drivers

Set nvidia-persistenced to start at boot

sudo systemctl enable nvidia-persistenced

Reboot the system

Installing docker, nvidia-docker2 on Ubuntu (optional)

If you plan to run containers on an Ubuntu host, use these steps in to install docker and nvidia-docker 2.

Install docker.

IBM Power

sudo apt-get update
sudo apt-get install apt-transport-https ca-certificates curl gnupg-agent software-properties-common
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo add-apt-repository "deb [arch=ppc64el] https://download.docker.com/linux/ubuntu bionic stable"
sudo apt-get update
sudo apt-get install docker-ce=18.06.1~ce~3-0~ubuntu

x86_64

sudo apt-get update
sudo apt-get install apt-transport-https ca-certificates curl gnupg-agent software-properties-common
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu bionic stable"
sudo apt-get update
sudo apt-get install docker-ce

Install nvidia-docker 2.

curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update
sudo apt-get install nvidia-docker2
sudo systemctl restart docker.service

For each userid that will run docker, add the userid to the docker group:
```
sudo usermod -a -G docker <userid>
```
Users must log out and log back in to pick up this group change.

Verify the setup.

IBM Power

nvidia-docker run --rm nvidia/cuda-ppc64le nvidia-smi

x86_64

nvidia-docker run --rm nvidia/cuda nvidia-smi

Note:

The nvidia-docker run command must be used with docker-ce (in other words, an Ubuntu host) to leverage the GPUs from within a container.

Installing Mellanox drivers

In order to use Infiniband with IBM Distributed Deep Learning and SnapML, install the latest Mellanox Driver from the Mellanox IBM Systems and Storage page.

Installing Perl

In order to use Spectrum MPI with IBM Distributed Deep Learning and SnapML, Perl must be installed on the system. Install Perl using the following command:

sudo yum install perl