EPUB_PRC_GPU_ISOLATION_PROCEDURE isolation procedure

Learn how to identify the service action that is needed to resolve a graphics processing unit (GPU) problem.

  1. Is the system an 8335-GTB?
    If Then
    Yes: Continue with the next step.
    No: Go to Contacting IBM service and support. This ends the procedure.
  2. Use the ipmitool command to examine system event logs (SELs).
    • To list SELs by using an in-band network, use the following command:

      ipmitool sel elist

    • To list SELs remotely over the LAN, use the following command:

      ipmitool -I lanplus -U <username> -P <password> -H <BMC IP addres or BMC hostname> sel elist

  3. Identify all SELs with CPU Func or CPU Core Func in the description. Did you find one or more SELs with CPU Func or CPU Core Func in the description?
    If Then
    Yes: Continue with the next step.
    No: Go to Contacting IBM service and support. This ends the procedure.
  4. For each of the SELs that you identified in step 3, is the sensor name CPU Func 1 or CPU Core Func x, where x is 1 - 12?
    If Then
    Yes: Continue with the next step.
    No: Continue with step 6.
  5. Replace the following items one at a time until the problem is resolved:
    Note: Go to 8335-GTB locations to identify the physical location and the removal and replacement procedure.
    1. CPU 1
    2. GPU 2
    3. GPU 1
    4. System backplane
    This ends the procedure.
  6. Is the sensor name CPU Func 2 or CPU Core Func x, where x is 13 - 24?
    If Then
    Yes: Continue with the next step.
    No: Go to Contacting IBM service and support. This ends the procedure.
  7. Replace the following items one at a time until the problem is resolved:
    Note: Go to 8335-GTB locations to identify the physical location and the removal and replacement procedure.
    1. CPU 2
    2. GPU 4
    3. GPU 3
    4. System backplane
    This ends the procedure.



Last updated: Thu, December 02, 2021