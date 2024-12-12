Base OS images do not include the CUDA software stack. Below, we describe a step-by-step process to manually configure the GPU drivers on the GPU VSI, assuming the default CentOS 8 is chosen as the base image for the VSI. Alternatively, this link provides the Ansible script to configure the GPU drivers on different operating systems, which is designed to work without further user modification to configure CUDA for any base images provided by IBM Cloud.

First, we should bring the base system up to date:

# dnf update

Reboot to take effect. Next, we should freeze the Linux kernel upgrade (unless you want to break CUDA). Edit /etc/yum.conf and append the exclude directive under [main] section:

[main] ... exclude=kernel*

We can re-enable the kernel upgrade when necessary, but remember that whenever the kernel is updated, CUDA driver kernel modules must be recompiled (or you must re-install CUDA).

Install some essential dependencies:

# dnf install kernel-devel kernel-headers tar gcc gcc-c++ make elfutils-libelf-devel bzip2 pciutils wget

We also need to disable the open-source nouveau driver that comes with CentOS. Edit /etc/default/grub and append the following to GRUB_CMDLINE_LINUX:

rd.driver.blacklist=nouveau nouveau.modeset=0

Generate a new grub configuration to include the above changes:

# grub2-mkconfig -o /boot/grub2/grub.cfg

Edit/create /etc/modprobe.d/blacklist.conf and append:

blacklist nouveau

Back up your old initramfs and then build a new one:

# mv /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r)-nouveau.img # dracut /boot/initramfs-$(uname -r).img $(uname -r)

Reboot to take effect.

Finally, follow the instructions from Nvidia to install CUDA (both the drivers and the toolkit).

Nvidia’s Persistence Mode is essential inside VMs, which resolves certain performance issues related to CUDA initialization. This only needs to be done once after every system reboot:

# for i in 0 1; do nvidia-smi -i $i -pm ENABLED; done

Note that this solution will be eventually deprecated in favor the Persistence Daemon. Please follow the latest official Nvidia instructions to enable Persistence Mode.