GPU-Traceoptionen (nur Linux, Windows)

Sie können mit den Standard-SDK-Traceoptionen einen Trace für Operationen erstellen, die mit der Grafikverarbeitungseinheit (Graphics Processing Unit, GPU) erfolgen, und die Traceausgabe an eine Datei senden.

Die Traceerstellung kann auf die übliche Weise konfiguriert werden, indem die Option -Xtrace über die Befehlszeile aufgerufen wird. Sie können beispielsweise den Methodentrace verwenden, um Probleme mit der Klasse CUDA4J oder com.ibm.gpu zu untersuchen.

Für eine Traceerstellung für CUDA-Operationen auf einer nativen Ebene können Sie die Tracekomponente ibm_gpu verwenden.

Weitere Informationen zur Traceverarbeitung, zur Traceverarbeitung für Komponenten und zu Tracepunkten finden Sie in den folgenden Themen:

-Xtrace in der OpenJ9 -Benutzerdokumentation
Tracepunkt-ID eines Tracepunkts ermitteln in der J9 VM-Referenz

CUDA4J Beispiel

Wollen Sie einen Trace für alle Methoden aller CUDA4J-Klassen erstellen, können Sie die folgende Option in der Befehlszeile angeben:

-Xtrace:print="mt,methods={com/ibm/cuda/*.*}"

com.ibm.gpu Beispiel

Wollen einen Trace für alle Methoden aller com.ibm.gpu-Klassen erstellen, können Sie die folgende Option in der Befehlszeile angeben:

-Xtrace:print="mt,methods={com/ibm/gpu/*.*}"

Verwenden Sie die folgende Befehlszeilenoption, um bei einem Trace für native CUDA-Operationen alle Tracedaten in die Datei trace.out auszugeben:

java -Xtrace:maximal=ibm_gpu,output=trace.out

Nachfolgend wird ein Beispiel der generierten und formatierten Traceausgabe gezeigt:

10:22:41.289702969  0x0000000000010500 ibm_gpu.0           Entry      
    >Called IBM_GPU_sortIntArray with params: numElements=5, device number: 0
10:22:41.289708961  0x0000000000010500 ibm_gpu.1           Event       
    IBM_GPU_sortIntArray starting CUDA malloc, numBytes=20
10:22:41.359505039  0x0000000000010500 ibm_gpu.2           Event       
    IBM_GPU_sortIntArray completed CUDA malloc, starting CUDA memcpy (host to 
    device), deviceData=0x8900400000, hostData=0x3fffa4698428, numBytes=20
10:22:41.359529008  0x0000000000010500 ibm_gpu.3           Event       
    IBM_GPU_sortIntArray completed CUDA memcpy (host to device), starting sort
10:22:41.363024404  0x0000000000010500 ibm_gpu.4           Event       
    IBM_GPU_sortIntArray completed sort, transferring from device to host: 
    deviceData=0x8900400000, hostData=0x3fffa4698428, numBytes=20
10:22:41.363119279  0x0000000000010500 ibm_gpu.5           Event       
    IBM_GPU_sortIntArray completed device to host memcpy
10:22:41.363637597  0x0000000000010500 ibm_gpu.6           Exit       
    <IBM_GPU_sortIntArray - return code=0
10:22:41.363686532  0x0000000000010500 ibm_gpu.7           Entry      
    >Called IBM_GPU_sortFloatArray with params: numElements=5, device number: 0
10:22:41.363687531  0x0000000000010500 ibm_gpu.8           Event       
    IBM_GPU_sortFloatArray starting CUDA malloc, numBytes=20
10:22:41.363793392  0x0000000000010500 ibm_gpu.9           Event       
    IBM_GPU_sortFloatArray completed CUDA malloc, starting CUDA memcpy (host to 
    device), deviceData=0x8900400000, hostData=0x3fffa4b98b08, numBytes=20
10:22:41.363805376  0x0000000000010500 ibm_gpu.10          Event       
    IBM_GPU_sortFloatArray completed CUDA memcpy (host to device), starting sort
10:22:41.365438225  0x0000000000010500 ibm_gpu.11          Event       
    IBM_GPU_sortFloatArray completed sort, transferring from device to host: 
    deviceData=0x8900400000, hostData=0x3fffa4b98b08, numBytes=20
10:22:41.365531103  0x0000000000010500 ibm_gpu.12          Event       
    IBM_GPU_sortFloatArray completed device to host memcpy
10:22:41.366046424  0x0000000000010500 ibm_gpu.13          Exit       
    <IBM_GPU_sortFloatArray - return code=0
10:22:41.366062403  0x0000000000010500 ibm_gpu.21          Entry      
    >Called IBM_GPU_sortDoubleArray with params: numElements=5, device number: 0
10:22:41.366063402  0x0000000000010500 ibm_gpu.22          Event       
    IBM_GPU_sortDoubleArray starting CUDA malloc, numBytes=40
10:22:41.366170261  0x0000000000010500 ibm_gpu.23          Event       
    IBM_GPU_sortDoubleArray completed CUDA malloc, starting CUDA memcpy (host to device), deviceData=0x8900400000, hostData=0x3fffa4b79a18, numBytes=40
10:22:41.366182245  0x0000000000010500 ibm_gpu.24          Event       
    IBM_GPU_sortDoubleArray completed CUDA memcpy (host to device), starting sort
10:22:41.369051467  0x0000000000010500 ibm_gpu.25          Event       
    IBM_GPU_sortDoubleArray completed sort, transferring from device to host: deviceData=0x8900400000, hostData=0x3fffa4b79a18, numBytes=40
10:22:41.369144344  0x0000000000010500 ibm_gpu.26          Event       
    IBM_GPU_sortDoubleArray completed device to host memcpy
10:22:41.369662662  0x0000000000010500 ibm_gpu.27          Exit       
    <IBM_GPU_sortDoubleArray - return code=0
10:22:41.369677642  0x0000000000010500 ibm_gpu.14          Entry      
    >Called IBM_GPU_sortLongArray with params: numElements=5, device number: 0
10:22:41.369678641  0x0000000000010500 ibm_gpu.15          Event       
    IBM_GPU_sortLongArray starting CUDA malloc, numBytes=40
10:22:41.369784501  0x0000000000010500 ibm_gpu.16          Event       
    IBM_GPU_sortLongArray completed CUDA malloc, starting CUDA memcpy (host to device), deviceData=0x8900400000, hostData=0x3fffa4b98b98, numBytes=40
10:22:41.369796486  0x0000000000010500 ibm_gpu.17          Event       
    IBM_GPU_sortLongArray completed CUDA memcpy (host to device), starting sort
10:22:41.371647048  0x0000000000010500 ibm_gpu.18          Event       
    IBM_GPU_sortLongArray completed sort, transferring from device to host: deviceData=0x8900400000, hostData=0x3fffa4b98b98, numBytes=40
10:22:41.371739926  0x0000000000010500 ibm_gpu.19          Event       
    IBM_GPU_sortLongArray completed device to host memcpy
10:22:41.372258243  0x0000000000010500 ibm_gpu.20          Exit       
    <IBM_GPU_sortLongArray - return code=0