You can enable GPUs when you want to create instance groups to use GPU resources for
applications and have GPU monitoring available.
About this task
To enable GPUs for instance groups
and GPU monitoring charts and table columns in the cluster management console, you configure a
parameter in the ego.conf file. You also need to configure GPU resource groups
to run the GPU workload.
Procedure
- If
you installed IBM® Spectrum
Conductor
2.5.0 from a fresh
installation, you must skip to the next step. Complete this step if you performed a rolling upgrade
from a previous version of IBM Spectrum
Conductor and have set
EGO_GPU_ENABLED=Y
using the gpuconfig.sh enable
command:
- Upgrade to the latest version of IBM Spectrum
Conductor. For more details, see Upgrade by using rolling upgrade topic.
- Disable the GPUs:
- To run with user interaction:
# $EGO_TOP/conductorspark/2.5.0/etc/gpuconfig.sh
disable
.
- To run without user interaction:
# $EGO_TOP/conductorspark/2.5.0/etc/gpuconfig.sh disable --quiet
-u username -x password
- Set
EGO_GPU_AUTOCONFIG=Y
in the $EGO_CONFDIR/ego.conf file on all
primary
hosts.
- Restart EGO on all the hosts in the cluster and restart all the
services:
egosh ego restart all
egosh service stop all
egosh service start all
- Verify the GPU resource information from the host properties of the
cluster management console or CLI by running the following command:
egosh resource list -o ngpus
The sample
output:
# egosh resource list -o ngpus
NAME ngpus
hostA 2
Results
EGO is restarted on all of the hosts in the cluster with the change applied. If you chose to
not restart the cluster, you need to manually restart EGO on all the hosts in the cluster and
restart all the services for the change to take effect.
What to do next
- You must configure GPU resource groups to run the GPU workload; see Using resource groups with GPU hosts.
- To disable GPUs:
- Set
EGO_GPU_AUTOCONFIG=N
in the $EGO_CONFDIR/ego.conf file
on all primary
hosts..
- Restart EGO on all the hosts in the cluster and restart all the
services:
egosh ego restart all
egosh service stop all
egosh service start all