Setting up Ubuntu
Follow these steps to set up your system with Ubuntu.
Ubuntu operating system and repository setup
- Install packages needed for the
installation
sudo apt-get install -y wget nano apt-transport-https ca-certificates curl software-properties-common
- Load the latest
kernel
sudo apt-get install linux-headers-$(uname -r) sudo reboot
- Or do a full
update
sudo apt-get update sudo apt-get dist-upgrade sudo reboot
System firmware
If you are running on an AC922 system, you need to update the firmware. Ensure that the system firmware is updated to at least the following levels before you install the current NVIDIA GPU driver.
The firmware series and fix levels that are required for AC922 for the current NVIDIA GPU driver are:
- 8335-GTG: OP910.30 or higher
- 8335-GTH: OP920.10 or higher
System firmware updates are available at Fix Central. To find your updates in Fix Central, follow these steps:
- Enter 8335-GTG or 8335-GTH as the Product Selector.
- Select the appropriate firmware series from the drop-down list.
- Click Continue to go to the Select fixes page.
- Select the appropriate fix level.
- Click Continue to go to the Download options page.
Installing the GPU driver
Many of the deep learning packages require the GPU driver packages from NVIDIA. See the WML CE prerequisites for the required and recommended versions of the GPU driver.
Install the GPU driver by following these steps:
- Download the NVIDIA GPU driver.
- Go to NVIDIA Driver Download.
- Select Product Type: Tesla
- Select Product Series: P-Series (for Tesla P100) or V-Series (for Tesla V100).
- Select Product: Tesla P100 or Tesla V100.
- Select Operating System: Linux POWER LE Ubuntu 18.04 . Click Show all Operating Systems if your version is not available.
- Select CUDA Toolkit: 10.1
- Click SEARCH to go to the download link.
- Click Download to download the driver.
- The driver file name is
NVIDIA-Linux-ppc64le-418.87.01.run
. Give this file execute permission and execute it on the Linux image where the GPU driver is to be installed.When the file is executed, you are asked two questions. It is recommended that you answer
Yes
to both questions. If the driver fails to install, check the /var/log/nvidia-installer.log file for relevant error messages. - Ensure the kernel headers are installed and match the running
kernel. Compare the outputs of:
- RHEL
anddpkg -l | grep linux-headers kernel-devel kernel-headers
uname -r
- Ubuntu
anddpkg -l | grep linux-headers kernel-package kernel-headers
uname -r
anddpkg -l | grep linux-headers kernel-devel kernel-headers
Ensure that theuname -r
linux-headers
package version exactly match the version of the running kernel. If they are not identical, bring them in sync as appropriate:- Install missing packages.
- Update downlevel packages.
- Reboot the system if the packages are newer than the active kernel.
- Install the GPU driver repository and
cuda-drivers:
sudo dpkg -i nvidia-driver-local-repo-ubuntu1804-418.*.deb
sudo apt-get update
sudo apt-get install cuda-drivers
- Set nvidia-persistenced to start at
boot
sudo systemctl enable nvidia-persistenced
- Reboot the system
Installing docker, nvidia-docker2 on Ubuntu (optional)
If you plan to run containers on an Ubuntu host, use these steps in to install docker and nvidia-docker 2.
- Install docker.
- IBM Power
-
sudo apt-get update sudo apt-get install apt-transport-https ca-certificates curl gnupg-agent software-properties-common curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add - sudo add-apt-repository "deb [arch=ppc64el] https://download.docker.com/linux/ubuntu bionic stable" sudo apt-get update sudo apt-get install docker-ce=18.06.1~ce~3-0~ubuntu
- x86_64
-
sudo apt-get update sudo apt-get install apt-transport-https ca-certificates curl gnupg-agent software-properties-common curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add - sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu bionic stable" sudo apt-get update sudo apt-get install docker-ce
- Install nvidia-docker
2.
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - distribution=$(. /etc/os-release;echo $ID$VERSION_ID) curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list sudo apt-get update sudo apt-get install nvidia-docker2 sudo systemctl restart docker.service
- For each userid that will run docker, add the userid to the docker
group:
sudo usermod -a -G docker <userid>
Users must log out and log back in to pick up this group change.
- Verify the setup.
- IBM Power
-
nvidia-docker run --rm nvidia/cuda-ppc64le nvidia-smi
- x86_64
-
nvidia-docker run --rm nvidia/cuda nvidia-smi
The nvidia-docker run command must be used with docker-ce
(in
other words, an Ubuntu host) to leverage the GPUs from within a container.
Installing Mellanox drivers
In order to use Infiniband with IBM Distributed Deep Learning and SnapML, install the latest Mellanox Driver from the Mellanox IBM Systems and Storage page.
Installing Perl
sudo yum install perl