IBM PowerKVM problem analysis

You can use the following procedure to find information about a problem with your server hardware that is running IBM® PowerKVM.

  1. Is the host operational?
    • Yes: Continue with step 10.
    • No: Continue with the next step.
  2. Attempt to reboot the host. Is the host operational now?
    • Yes: Continue with step 10.
    • No: Continue with the next step.
  3. Details about errors can be found in the control panel or in the Advanced System Management Interface (ASMI).
    Do you choose to look for error details using ASMI?
    • Yes: Continue with step 7.
    • No: Continue with the next step.
  4. At the control panel, complete the following steps.
    1. Press the increment or decrement button until the number 11 is displayed in the upper-left corner of the display.
    2. Press Enter to display the contents of function 11.
    3. Look in the upper-right corner for a reference code.
    Is there a reference code displayed on the control panel in function 11?
    • Yes: Continue with the next step.
    • No: Contact your hardware service provider.
  5. The reference code description might provide information or an action that you can take to correct the failure.
    Use the search function of IBM Knowledge Center to find the reference code details. The search function is located in the upper-left corner of IBM Knowledge Center. Read the reference code description and return here. Do not take any other action at this time.

    For more information about reference codes, see Reference codes.

    Was there a reference code description that enabled you to resolve the problem?
    • Yes: This ends the procedure.
    • No: Continue with the next step.
  6. Service is required to resolve the error. Collect as much error data as possible and record it. You and your service provider will develop a corrective action to resolve the problem based on the following guidelines:
    • If a field-replaceable unit (FRU) location code is specified in the control panel, that location must be used to determine which FRU to replace.
    • If an isolation procedure is listed for the reference code in the reference code lookup information, include it as a corrective action even if it is not listed in the control panel.
    • If any FRUs are marked for block replacement, replace all FRUs in the block replacement group at the same time.
    To find error details, complete the following steps:
    1. Press Enter to display the contents of function 14. If data is available in function 14, the reference code has a FRU list.
    2. Record the information in functions 11 - 20 on the control panel.
    3. Contact your service provider and report the reference code and other information.

    This ends the procedure.

  7. On the console connected to the ASMI, complete the following steps:
    Note: If you are unable to locate the reported problem, and there is more than one open problem near the time of the reported failure, use the earliest problem in the log.
    1. Log in with a user ID that has an authority level as general, administrator, or authorized service provider.
    2. In the navigation area, expand System Service Aids and click Error/Event Logs. If log entries exist, a list of error and event log entries is displayed in a summary view.
    3. Scroll through the log in the Serviceable Customer Attention Events area and verify that there is a problem to correspond with the failure exists.

    For more information about the ASMI, see Managing the Advanced System Management Interface.

    Do you find a serviceable event, or an open problem near the time of the failure?
    • Yes: Continue with the next step.
    • No: Contact your hardware service provider. This ends the procedure.
  8. The reference code description might provide information or an action that you can take to correct the failure.
    Use the search function of IBM Knowledge Center to find the reference code details. The search function is located in the upper-left corner of IBM Knowledge Center. Read the reference code description and return here. Do not take any other action at this time.

    For more information about reference codes, see Reference codes.

    Was there a reference code description that enabled you to resolve the problem?
    • Yes: This ends the procedure.
    • No: Continue with the next step.
  9. Service is required to resolve the error. Collect as much error data as possible and record it. You and your service provider will develop a corrective action to resolve the problem based on the following guidelines:
    • If a FRU location code is specified in the serviceable event view or control panel, that location must be used to determine which FRU to replace.
    • If an isolation procedure is listed for the reference code in the reference code lookup information, include it as a corrective action even if it is not listed in the serviceable event view or control panel.
    • If any FRUs are marked for block replacement, replace all FRUs in the block replacement group at the same time.
    From the Error Event Log view, complete the following steps:
    1. Record the reference code.
    2. Select the corresponding check box on the log and click Show details.
    3. Record the error details.
    4. Contact your service provider.

    This ends the procedure.

  10. Are any messages (for example, a device is not available or reporting errors) related to this problem or sent to you in email that provides a reference code?
    Note: A reference code is an 8 character system reference code (SRC).
    • Yes: Continue with the next step.
    • No: Go to step 12.
  11. The reference code description might provide information or an action that you can take to correct the failure.
    Use the search function of IBM Knowledge Center to find the reference code details. The search function is located in the upper-left corner of IBM Knowledge Center. Read the reference code description and return here. Do not take any other action at this time.

    For more information about reference codes, see Reference codes.

    If the reference code description provides information to resolve the problem without replacing FRUs in the failing item list, perform the steps.

    Were you able to resolve the problem?

    • Yes: This ends the procedure.
    • No: Continue with the next step.
  12. Do you suspect a problem with a 3D graphics adapter?
  13. Do you suspect a problem with a PCIe3 1.6 TB NVMe Flash adapter (FC EC54 and EC55; CCIN 58CB) or a PCIe3 3.2 TB NVMe Flash adapter (FC EC56 and EC57; CCIN 58CC)?
  14. To locate the error information in a system running IBM PowerKVM, complete the following steps:
    1. Log in as root user.
    2. At the command line, type opal-elog-parse -s and press Enter.
    3. Look for the most recent entry that contains a reference code.
  15. Do you find a serviceable event or an open problem near the time of the failure?
    • Yes: Continue with the next step.
    • No: Continue with step 17.
  16. The reference code description might provide information or an action that you can take to correct the failure.
    Use the search function of IBM Knowledge Center to find the reference code details. The search function is located in the upper-left corner of IBM Knowledge Center. Read the reference code description and return here. Do not take any other action at this time.

    For more information about reference codes, see Reference codes.

    Was there a reference code description that enabled you to resolve the problem?
    • Yes: This ends the procedure.
    • No: Continue with the next step.
  17. Service is required to resolve the error. Collect as much error data as possible and record it. You and your service provider will develop a corrective action to resolve the problem based on the following guidelines:
    • If a field-replaceable unit (FRU) location code is specified, use that location to determine which FRU to replace.
    • If an isolation procedure is listed for the reference code in the reference code lookup information, include it as a corrective action even if it is not listed in the serviceable event view or control panel.
    • If any FRUs are marked for block replacement, replace all FRUs in the block replacement group at the same time.
    Complete the following steps:
    1. Record the reference code, if available.
    2. Record the error details.
    3. Run the sosreport command to collect debug data.
    4. Contact your service provider.

    This ends the procedure.




Last updated: Tue, October 17, 2017