Enabling GPUs

You can enable GPUs when you want to create instance groups to use GPU resources for applications and have GPU monitoring available.

About this task

To enable GPUs for instance groups and GPU monitoring charts and table columns in the cluster management console, you configure a parameter in the ego.conf file. You also need to configure GPU resource groups to run the GPU workload.

Procedure

If you installed IBM® Spectrum Conductor 2.5.0 from a fresh installation, you must skip to the next step. Complete this step if you performed a rolling upgrade from a previous version of IBM Spectrum Conductor and have set EGO_GPU_ENABLED=Y using the gpuconfig.sh enable command:
1. Upgrade to the latest version of IBM Spectrum Conductor. For more details, see Upgrade by using rolling upgrade topic.
2. Disable the GPUs:
  - To run with user interaction: # $EGO_TOP/conductorspark/2.5.0/etc/gpuconfig.sh disable.
  - To run without user interaction: # $EGO_TOP/conductorspark/2.5.0/etc/gpuconfig.sh disable --quiet -u username -x password
Set EGO_GPU_AUTOCONFIG=Y in the $EGO_CONFDIR/ego.conf file on all primary hosts.
Restart EGO on all the hosts in the cluster and restart all the services:
```
egosh ego restart all
egosh service stop all
egosh service start all
```
Verify the GPU resource information from the host properties of the cluster management console or CLI by running the following command:
```
egosh resource list -o ngpus
```
The sample output:
```
# egosh resource list -o ngpus
NAME    ngpus
hostA     2
```

Results

EGO is restarted on all of the hosts in the cluster with the change applied. If you chose to not restart the cluster, you need to manually restart EGO on all the hosts in the cluster and restart all the services for the change to take effect.

What to do next

You must configure GPU resource groups to run the GPU workload; see Using resource groups with GPU hosts.
To disable GPUs:
1. Set EGO_GPU_AUTOCONFIG=N in the $EGO_CONFDIR/ego.conf file on all primary hosts..
2. Restart EGO on all the hosts in the cluster and restart all the services:
```
egosh ego restart all
egosh service stop all
egosh service start all
```