GPU application development
To develop GPU applications, you should also understand the GPU clients, services, scheduling and API.
Developing GPU clients
There are no changes between the way to develop client for standard IBM® Spectrum Symphony application and the way to develop GPU IBM Spectrum Symphony client.
Developing GPU services
You create your service by extending GpuServiceContainer class and implementing the required handler methods. The main difference between regular IBM Spectrum Symphony service and GPU IBM Spectrum Symphony service is that the only required method is onGpuInvoke() and not onInvoke(), which is not part of the GPU service container API. All other methods such as onCreateService(), onSessionEnter() etc. remain unchanged. All principles as to Service lifecycle, Service Instance lifecycle, timeouts and optional handlers API etc are the same if you substitute onInvoke() with onGpuInvoke(). However, there is extended GPU Service Container API, which provides additional GPU related information and control options.
About GPU scheduling
The GPU scheduling is performed by IBM Spectrum Symphony on service level and takes place before onCreateService() is called. It is done by automatic selection one of the available devices and setting CUDA_VISIBLE_DEVICES accordingly. Once the service started running, all tasks associated with it, will be executed on the same device scheduled during its initialization. If the scheduling method is exclusive, the GPU device will not be available for other services until the particular service is shut down.
GPU API
Method | Description |
---|---|
onGpuInvoke() |
Has the same meaning and signature as onInvoke() and explained above |
onGpuEccError(soam::TaskContextPtr& context, int ecc) | This handler is executed by IBM Spectrum Symphony after every task. IBM Spectrum Symphony provides the number of double bit volatile ECC errors found for GPU device scheduled for particular task. This number can be either 0 if no ECC errors were found or positive number indicating the number errors found. This handler is optional. In its default implementation, IBM Spectrum Symphony fails the task and blocks the host. |
getAssignedGpuDeviceId() | The method returns the ID of GPU device scheduled by IBM Spectrum Symphony to the service or -1 if the scheduling failed for some reason. This method is a helper method that can be called in any stage of GPU service lifecycle. |
getLastGpuError() | The method returns the error number if the scheduling fails and getAssignedGpuDeviceId() returns -1. The possible values are GPU_SYSTEM_ERROR and GPU_INSUFFICIENT_DEVICES. This method is a helper method and should be called if getAssignedGpuDeviceId() returns -1. |
getLastGpuErrorDetails() | The method returns the string containing additional information regarding the possible reasons for GPU scheduling failure. This method is a helper method and should be called if getAssignedGpuDeviceId() returns -1 |