Setting up Red Hat Enterprise Linux
Follow these steps to set up your IBM POWER8 or POWER9 system with Red Hat Enterprise Linux
Upgrade to 7.6
Red Hat Enterprise Linux 7.5 is no longer supported. If you have RHEL 7.5 installed, upgrade to 7.6:
subscription-manager release --unset
yum clean all
yum update -y
reboot
Red Hat Enterprise Linux operating system and repository setup
- Enable
common
,optional
, andextra
repo channels.IBM® POWER8:sudo subscription-manager repos --enable=rhel-7-for-power-le-optional-rpms
sudo subscription-manager repos --enable=rhel-7-for-power-le-extras-rpms
sudo subscription-manager repos --enable=rhel-7-for-power-le-rpms
IBM POWER9:sudo subscription-manager repos --enable=rhel-7-for-power-9-optional-rpms
sudo subscription-manager repos --enable=rhel-7-for-power-9-extras-rpms
sudo subscription-manager repos --enable=rhel-7-for-power-9-rpms
- Install packages needed for the
installation.
sudo yum -y install wget nano bzip2
- Enable Fedora Project EPEL (Extra Packages for Enterprise Linux repo:
wget https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
sudo rpm -ihv epel-release-latest-7.noarch.rpm
- Load the latest kernel or do a full update:
- Load the latest
kernel:
sudo yum update kernel kernel-devel kernel-tools kernel-tools-libs kernel-bootwrapper
reboot
- Do a full update:
sudo yum update
sudo reboot
- Load the latest
kernel:
System firmware
If you are running on an AC922 system, you need to update the firmware. Ensure that the system firmware is updated to at least the following levels before you install the current NVIDIA GPU driver.
The firmware series and fix levels that are required for AC922 for the current NVIDIA GPU driver are:
- 8335-GTG: OP910.30 or higher
- 8335-GTH: OP920.10 or higher
System firmware updates are available at Fix Central. To find your updates in Fix Central, follow these steps:
- Enter 8335-GTG or 8335-GTH as the Product Selector.
- Select the appropriate firmware series from the drop-down list.
- Click Continue to go to the Select fixes page.
- Select the appropriate fix level.
- Click Continue to go to the Download options page.
IBM POWER9™ specific udev rules
To disable it, follow these steps:
- Copy the
/lib/udev/rules.d/40-redhat.rules
file to the directory for user overridden rules.sudo cp /lib/udev/rules.d/40-redhat.rules /etc/udev/rules.d/
- Edit the
/etc/udev/rules.d/40-redhat.rules
file.sudo nano /etc/udev/rules.d/40-redhat.rules
- Comment out the entire "Memory hotadd request" section and save the change:
# Memory hotadd request #SUBSYSTEM!="memory", ACTION!="add", GOTO="memory_hotplug_end" #PROGRAM="/bin/uname -p", RESULT=="s390*", GOTO="memory_hotplug_end" #ENV{.state}="online" #PROGRAM="/bin/systemd-detect-virt", RESULT=="none", ENV{.state}="online_movable" #ATTR{state}=="offline", ATTR{state}="$env{.state}" #LABEL="memory_hotplug_end"
- Optionally, delete the first line of the file, since the file was copied to a directory where it
cannot be overwritten.
# do not edit this file, it will be overwritten on update
- Restart the system for the changes to take effect.
sudo reboot
Remove previously installed CUDA and NVIDIA drivers
The CUDA Toolkit, cuDNN and NCCL are provided as Conda packages and no longer require separate installations. The GPU driver must still be installed separately.
Before installing the updated GPU driver, uninstall any previously-installed CUDA and NVIDIA drivers. Follow these steps:
- Remove all CUDA Toolkit and GPU driver packages.
You can display installed CUDA and driver packages by running these commands:
rpm -qa | egrep 'cuda.*(9-2|10-0)'
rpm -qa | egrep '(cuda|nvidia).*(396|410)\.'
Verify the list and remove with yum remove.
- Remove any CUDA Toolkit and GPU driver repository packages.
These should have been included in step 1, but you can confirm with this command:
rpm -qa | egrep '(cuda|nvidia).*repo'
Use yum remove to remove any that remain.
- Clean the yum repository:
sudo yum clean all
- Remove cuDNN and
NCCL:
sudo rm -rf /usr/local/cuda /usr/local/cuda-9.2 /usr/local/cuda-10.0
- Reboot the system to unload the GPU driver
sudo shutdown -r now
Install the GPU driver
The Deep Learning packages require the GPU driver packages to be downloaded from NVIDIA. See the PowerAI prerequisites for the required and recommended versions of these components.
Install the GPU driver by following these steps:
- Download the NVIDIA GPU driver:
- Go to NVIDIA Driver Download.
- Select Product Type: Tesla
- Select Product Series: P-Series (for Tesla P100) or V-Series (for Tesla V100).
- Select Product: Tesla P100 or Tesla V100
- Select Operating System: Linux POWER LE RHEL 7 . Click Show all Operating Systems if your version is not available.
- Select CUDA Toolkit: 10.1
- Click SEARCH to go to the download link.
- Click Download to download the driver.
- Install the GPU driver repository and
cuda-drivers.
sudo rpm -ivh nvidia*driver-local-repo-rhel7-418.*.rpm
sudo yum install cuda-drivers
- Set nvidia-persistenced to start at
boot
sudo systemctl enable nvidia-persistenced
- Reboot the system
Installing Mellanox drivers
In order to use Infiniband with IBM Distributed Deep Learning and SnapML, install the latest Mellanox Driver from http://www.mellanox.com/page/firmware_table_IBM_SystemP.