Beginning problem analysis

You can use problem analysis to gather information that helps you determine the nature of a problem encountered on your system. This information is used to determine if you can resolve the problem yourself or to gather sufficient information to communicate with a service provider and quickly determine the service action that needs to be taken.

If you are using this information because of a problem with your Hardware Management Console (HMC), see Managing the HMC.

To begin analyzing the problem, complete the following steps:

  1. Do you have a direct indication of a hardware error (such as an automated email that notified you of a hardware error or a fault indicator on a system unit or expansion unit)?
  2. How do you manage the system that is failing? If you do not know how the failing system is managed, ask the system administrator.
    System management Problem analysis
    Hardware Management Console (HMC) Go to the section Hardware Management Console (HMC) problem analysis.
    Operating system (AIX®, Linux®, or IBM® i) Go to the problem analysis topic for your operating system.

Hardware Management Console (HMC) problem analysis

To perform beginning problem analysis on a system that is managed by Hardware Management Console (HMC), complete the following steps:

  1. Is the management console functional and connected to the hardware?
    • Yes: Continue with the next step.
    • No: Start the management console and attach it to the system unit. Then return here and continue with the next step.
  2. On the management console that is used to manage the system unit, complete the following steps:
    Note: If you are unable to locate the reported problem, and there is more than one open problem near the time of the reported failure, use the earliest problem in the log.
    1. In the navigation area, click the Serviceability icon Serviceability, and then click Serviceable Events Manager. The Manage Serviceable Events window is displayed.
    2. In the Event Criteria area, for Serviceable Event Status, select Open. For all other criteria, select ALL, then click OK.

    Scroll through the log and verify that there is a problem with the status of Open to correspond with the failure.

    Do you find a serviceable event, or an open problem near the time of the failure?

    • Yes: Continue with the next step.
    • No: Contact your hardware service provider. This ends the procedure.
  3. The reference code description might provide information or an action that you can take to correct the failure.
    Use the search function of IBM Knowledge Center to find the reference code details. The search function is located in the upper-left corner of IBM Knowledge Center. Read the reference code description and return here. Do not take any other action at this time.

    For more information about reference codes, see Reference codes.

    Was there a reference code description that enabled you to resolve the problem?

    • Yes: This ends the procedure.
    • No: Continue with the next step.
  4. Service is required to resolve the error. Collect as much error data as possible and record it. You and your service provider will develop a corrective action to resolve the problem based on the following guidelines:
    • If a FRU location code is provided in the serviceable event view or control panel, use that location to determine which FRU to replace.
    • If an isolation procedure is listed for the reference code in the reference code lookup information, include the isolation procedure as a corrective action even if it is not listed in the serviceable event view or control panel.
    • If any FRUs are marked for block replacement, replace all FRUs in the block replacement group at the same time.
    From the Repair Serviceable Event window, complete the following steps:
    1. Record the problem management record (PMR) number for the problem if one is listed.
    2. Select the serviceable event from the list.
    3. Click Selected and View Details.
    4. In the Serviceable Event Details page, locate details such as the reference code and FRU list and record this information.
    5. If a Problem Management Hardware (PMH) number was found for the problem on the Serviceable Event Overview panel, the problem has already been reported. If there was no PMH number for the problem, contact your service provider.

    This ends the procedure.