WML CE tuning recommendations

Find recommended settings for optimal deep learning performance on the S822LC and AC922 for High-Performance Computing.

Enable Performance Governor

sudo yum install kernel-tools
sudo cpupower -c all frequency-set -g performance

Enable GPU persistence mode

sudo systemctl enable nvidia-persistenced
sudo systemctl start nvidia-persistenced

Set GPU memory and graphics clocks

S822LC with NVIDIA Tesla P100, set clocks to maximum

sudo nvidia-smi -ac 715,1480

AC922 with NVIDIA Tesla V100, set clocks to NVIDIA defaults

sudo nvidia-smi -rac

For TensorFlow, set the SMT mode

S822LC with NVIDIA Tesla P100, set SMT=2.

sudo ppc64_cpu --smt=2

AC922 with NVIDIA Tesla V100, set SMT based on DDL usage:

sudo ppc64_cpu --smt=4    # for TensorFlow WITHOUT DDL
sudo ppc64_cpu --smt=2    # for TensorFlow WITH DDL

When running multiple TensorFlow jobs on an AC922 or S822LC, both the operating system process limit and the number of threads in TensorFlow thread pools should be taken into consideration. In some cases, the high number of threads in these pools can negatively impact performance. For instructions to configure the number of threads in the TensorFlow thread pools, see the TensorFlow Manual tuning topic.