Pre-requisites for installing Watson Studio Local with NVIDIA GPU support

Ensure that your servers meet the requirements for installing Watson Studio Local if you use a GPU NVIDIA operating system.

Graphics Processing Unit system preparation
Azure specific pre-install instructions
General pre-install instructions
More requirements for POWER with NVIDIA GPUs

Graphics Processing Unit system preparation

Watson Studio Local supports GPUs by POWER systems, NVIDIA in Azure, AWS and Softlayer. If you have NVIDIA GPUs, you must perform the following steps before installing Watson Studio Local.

Azure specific pre-install instructions

If you are an Azure user, Azure is the only environment that has very specific kernel version requirements. The default kernel level of a non HPC node will not be compatible with Watson Studio Local:

uname -r 3.10.0-514.28.1.el7.x86_64

To install a supported kernel level, run the following code:

yum install kernel-3.10.0-514.21.1.el7.x86_64
reboot

General pre-install instructions

Follow these steps, along with the example code, to pre-install packages, modules and update video drivers for Watson Studio Local pre-installation. For instructions on downloading NVIDIA drivers, jump to the section Download and install NVIDIA GPU driver.

Install pciutils.
```
yum install pciutils
```
Check for the default video driver.
```
lsmod | grep -i nouveau
```
Disable the default nouveau drivers using the following procedure below to a) update grub.conf and b) blacklist.conf (depending on the image you use these might already be disabled. See step 2) and c) reboot.
1. Update grub to blacklist the nouveau driver by appending rd.driver.blacklist=nouveau nouveau.modeset=0 to the GRUB_CMDLINE_LINUX line as shown below.
```
vi /boot/default/grub

 Change : GRUB_CMDLINE_LINUX="console=ttyS0,115200n8 console=tty0
net.ifnames=0 crashkernel=auto”
 to 
 GRUB_CMDLINE_LINUX="console=ttyS0,115200n8 console=tty0
net.ifnames=0 crashkernel=auto rd.driver.blacklist=nouveau
nouveau.modeset=0"
```
  Complete the update by running the following command:
```
grub2-mkconfig -o /boot/grub2/grub.cfg
```
2. Edit/create the file path /etc/modprobe.d/blacklist.conf and append.
```
blacklist nouveau
```
3. Reboot the system to activate the changes.
Add kernel-tools and kernel-devel packages by installing the version that matches the kernel version.
```
yum install kernel-devel-$(uname -r)
kernel-headers-$(uname -r)
```
Install gcc.
```
yum install gcc
```

Acquire the dkms package from an external repo.

rpm -ivh https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
yum install dkms

Download and install NVIDIA GPU driver

Follow these steps, along with the examples provided, to download the latest NVIDIA driver:

Download the NVIDIA 8.0 drivers: Download Drivers
Figure 1. Example of selected driver

Install the NVIDIA repo.

rpm -i nvidia-diag-driver-local-repo-rhel7-384.66-1.0-1.x86_64.rpm

Install the drivers and then reboot. CUDA-enabled NVIDIA 8.0 GPU must be installed on the host operating system compute nodes that have a GPU.
```
yum install cuda-drivers
reboot
```
Verify the installation:
```
nvidia-smi
```
Tip: If the command is slow, persistence mode might be enabled. Disable persistence with the following command:
```
nvidia-smi -pm 0
```

Figure 2. Example of verification results

installation verification results example

You are now ready to install Watson Studio Local and reap the accelerated compute speed of the GPU.

More requirements for POWER with NVIDIA GPUs

Check the following items described in IBM PowerAI documentation:

Operating system and repository setup (RHEL version should be 7.5; do not run the yum update command because it might change the RHEL version)
System firmware
NVIDIA Components: IBM POWER9 specific udev rules (for POWER9 only)

Download NVIDIA driver 396.44 from NVIDIA Driver Downloads.
1. Select Product Type: Tesla
2. Select Product Series: P-Series or V-Series
3. Select Product: Tesla P100 or Tesla V100
4. Select Operating System: 'Show All Operating Systems' - 'Linux POWER LE RHEL 7'
5. Select CUDA Toolkit: 9.2
6. Click Search to go do the download link, download nvidia-driver-local-repo-rhel7-*.ppc64le.rpm.

Install the GPU driver:

rpm -i nvidia-driver-local-repo-rhel7-*.ppc64le.rpm
yum install cuda-driver

Verify GPU:

#verify GPU can be seen
nvidia-smi

#verify if device file has been created
ls /dev/nvidia-uvm

#If device file not found download utility (https://www.ibm.com/support/knowledgecenter/en/SSBS6K_2.1.0.2/manage_cluster/verify_gpu.html) and execute it:
./cudaInit_ppc64le

#verify file is created
ls /dev/nvidia-uvm

#verify if device log file exists:
ls /var/lib/docker/volumes/

#if the device log file is missing, create the directory `nvidia_driver_xxx.xx`
cd /var/lib/docker/volumes
mkdir nvidia_driver_396.44

If the verification fails, you might need to restart your server.