subscribe iconSubscribe to this information
POWER7 information

Beginning problem analysis

You can use problem analysis to gather information that helps you determine the nature of a problem encountered on your system. This information is used to determine if you can resolve the problem yourself or to gather sufficient information to communicate with a service provider and quickly determine the service action that needs to be taken.

The method of finding and collecting error information depends on the state of the hardware at the time of the failure. This procedure directs you to one of the following places to find error information:

If you are using this information because of a problem with your Hardware Management Console (HMC), see Managing the HMC.

If you are using this information because of a problem with your IBM® Systems Director Management Console (SDMC), see Managing the SDMC.

To begin analyzing the problem, complete the following steps:

  1. Did you observe an activated LED on your system unit or expansion unit? To view an example of the control panel LEDs, see Control panel LEDs.
    • Yes: Continue with the next step.
    • No: Go to step 7.
  2. Was the activated LED on the system unit?
    • Yes: Continue with the next step.
    • No: The activated LED is on an expansion unit that is connected to the system unit. Go to step 4.
  3. Is the activated LED the system information light (designated by an i)?
    • Yes: Go to step 7.
    • No: Continue with the next step.
  4. Is the activated LED the enclosure fault indicator (designated by an !)?
    • Yes: Use Light Path diagnostics to identify and service the failing part. Continue with the next step.
    • No: Go to step 7.
  5. The reference code description might provide information or an action that you can take to correct the failure.
    Use the information center search function to find the reference code details. The information center search function is located in the upper left corner of this information center. Read the reference code description and return here. Do not take any other action at this time.

    Was there a reference code description that enabled you to resolve the problem?

    • Yes: This ends the procedure.
    • No: Continue with the next step.
  6. In the serviceable event view of the error, record the part number and location code of the first field-replaceable unit (FRU). Other FRUs might be listed but the first FRU has a high probability of resolving the problem. When you have identified the first FRU in the list, contact your service provider to obtain a replacement part. Do not remove power to the unit until you are ready to exchange the FRU with a replacement FRU.
    When you have the replacement part and are ready to exchange it, go to Replacing FRUs by using enclosure fault indicators. This ends the procedure.
  7. Are all system units and expansion units powered on or are you able to power them on?
    Note: An enclosure is powered on when its green power indicator is on and not flashing.
    • Yes: Go to step 9.
    • No: Continue with the next step.
  8. Ensure that the power supplied to the system is adequate. If your processor enclosures and I/O enclosures are protected by an emergency power off (EPO) circuit, check that the EPO switch is not activated. Verify that all power cables are correctly connected to the electrical outlet. When power is available, the Function/Data display on the control panel is lit. If you have an uninterruptible power supply, verify that the cables are correctly connected to the system, and that it is functioning. Power on all processor and I/O enclosures.

    Did all enclosures power on?

    Note: An enclosure is powered on when its green power indicator is on and not blinking.

    In a single-enclosure server with a redundant service processor, a progress code displays on the control (operator) panel several seconds after ac power is first applied. This progress code remains on the control panel for 1-2 minutes, then the progress code is updated every 20-30 seconds as the system powers on.

    In a multiple-enclosure server with a redundant service processor, a progress code does not display on the control (operator) panel until 1-2 minutes after ac power is first applied. After the first progress code displays, the progress code is updated every 20-30 seconds as the system unit powers on.

    • Yes: This ends the procedure.
    • No: Continue with the next step.
  9. Is the failing hardware managed by a management console?
    • Yes: Go to step 18.
    • No: Continue with the next step.
  10. Is your system being managed by the Integrated Virtualization Manager?
  11. If an operating system was running at the time of the failure, information about the failure is found in the operating system's serviceable event view unless the failure prevented the operating system from doing so. If that operating system is no longer running, attempt to reboot it before answering the following question.

    Was an operating system running at the time of the failure and is the operating system running now?

    • Yes: Go to step 17.
    • No: Continue with the next step.
  12. Details about errors that occur when an operating system is not running or is now not accessible can be found in the control panel or in the Advanced System Management Interface (ASMI).

    Do you choose to look for error details using ASMI?

    • Yes: Go to step 25.
    • No: Continue with the next step.
  13. At the control panel, complete the following steps.
    1. Press the increment or decrement button until the number 11 is displayed in the upper-left corner of the display.
    2. Press Enter to display the contents of function 11.
    3. Look in the upper-right corner for a reference code.

    Is there a reference code displayed on the control panel in function 11?

    • Yes: Continue with the next step.
    • No: Contact your hardware service provider.
  14. The reference code description might provide information or an action that you can take to correct the failure.
    Go to the Reference code finder and type the reference code in the field provided. Read the reference code description and return here. Do not take any other action at this time.

    Was there a reference code description that enabled you to resolve the problem?

    • Yes: This ends the procedure.
    • No: Continue with the next step.
  15. Service is required to resolve the error. Collect as much error data as possible and record it. You and your service provider will develop a corrective action to resolve the problem based on the following guidelines:
    • If a field-replaceable unit (FRU) location code is provided in the serviceable event view or control panel, that location should be used to determine which FRU to replace.
    • If an isolation procedure is listed for the reference code in the reference code lookup information, include it as a corrective action even if it is not listed in the serviceable event view or control panel.
    • If any FRUs are marked for block replacement, replace all FRUs in the block replacement group at the same time.

    To find error details:

    1. Press Enter to display the contents of function 14. If data is available in function 14, the reference code has a FRU list.
    2. Record the information in functions 11 through 20 on the control panel.
    3. Contact your service provider and report the reference code and other information.

    This ends the procedure.

  16. Is your system being managed by the Integrated Virtualization Manager?
    Note: If you install the Virtual I/O Server on a system unit that is not managed by a management console, then the Integrated Virtualization Manager is enabled.
    • Yes: Go to step 22.
    • No: Continue with the next step.
  17. If you are having a problem with an AIX, Linux, or IBM i system unit, or InfiniBand switch, go to the appropriate procedure.

    This ends the procedure.

  18. Is the management console functional and connected to the hardware?
    • Yes: Continue with the next step.
    • No: Start the management console and attach it to the system unit. Then return here and continue with the next step.
  19. On management console that is used to manage the system unit, complete the following steps:
    Note: If you are unable to locate the reported problem, and there is more than one open problem near the time of the reported failure, use the earliest problem in the log.
    For HMC:
    1. In the navigation area, click Service Management > Manage Events. The Manage Serviceable Events - Select Serviceable Events window is shown.
    2. In the Event Criteria area, for Serviceable Event Status, select Open. For all other criteria, select ALL, then click OK.
    For SDMC:
    1. On the Service and Support Manager page, select Serviceable Problems from the Electronic Services Links list box.
      Tip: The Serviceable Problems pane displays a filtered list of only those problems associated with systems that are monitored by the Service and Support Manager.
    2. Click the problem listed in the Name column that you want to work with. This step displays the properties of the selected problem.

    Scroll through the log and verify that there is a problem with the status of Open to correspond with the failure.

    Do you find a serviceable event, or an open problem near the time of the failure?

  20. The reference code description might provide information or an action that you can take to correct the failure.
    Go to the Reference code finder and type the reference code in the field provided. Read the reference code description and return here. Do not take any other action at this time.

    Was there a reference code description that enabled you to resolve the problem?

    • Yes: This ends the procedure.
    • No: Continue with the next step.
  21. Service is required to resolve the error. Collect as much error data as possible and record it. You and your service provider will develop a corrective action to resolve the problem based on the following guidelines:
    • If a FRU location code is provided in the serviceable event view or control panel, that location should be used to determine which FRU to replace.
    • If an isolation procedure is listed for the reference code in the reference code lookup information, include it as a corrective action even if it is not listed in the serviceable event view or control panel.
    • If any FRUs are marked for block replacement, replace all FRUs in the block replacement group at the same time.
    From the Repair Serviceable Event window, complete the following steps:
    1. Record the problem management record (PMR) number for the problem if one is listed.
    2. Select the serviceable event from the list.
    3. Select Selected and View Details.
    4. Record the reference code and FRU list found in the Serviceable Event Details.
    5. If a PMH number was found for the problem on the Serviceable Event Overview panel, the problem has already been reported. If there was no PMH number for the problem, contact your service provider.

    This ends the procedure.

  22. Log on to the Integrated Virtualization Manager interface if not already logged on.
    • In the Integrated Virtualization Manager Navigation bar, select Manage Serviceable Events (under Service Management).
    • Scroll through the log and verify that there is a problem with status as Open to correspond with the failure.

    Do you find a serviceable event, or an open problem near the time of the failure?

    • Yes: Continue with the next step.
    • No: Go to step 17.
  23. The reference code description might provide information or an action that you can take to correct the failure.
    Go to the Reference code finder and type the reference code in the field provided. Read the reference code description and return here. Do not take any other action at this time.

    Was there a reference code description that enabled you to resolve the problem?

    • Yes: This ends the procedure.
    • No: Continue with the next step.
  24. Service is required to resolve the error. Collect as much error data as possible and record it. You and your service provider will develop a corrective action to resolve the problem based on the following guidelines:
    • If a FRU location code is provided in the serviceable event view or control panel, that location should be used to determine which FRU to replace.
    • If an isolation procedure is listed for the reference code in the reference code lookup information, it should be included as a corrective action even if it is not listed in the serviceable event view or control panel.
    • If any FRUs are marked for block replacement, all FRUs in the block replacement group should be replaced at the same time.
    From the Selected Serviceable Events table, complete the following steps.
    • Record the reference code.
    • Select the serviceable event.
    • Select View Associated FRUs.
    • Contact your service provider

    This ends the procedure.

  25. On the console connected to the ASMI, complete the following steps.
    Note: If you are unable to locate the reported problem, and there is more than one open problem near the time of the reported failure, use the earliest problem in the log.
    1. Log in with a user ID that has an authority level as general, administrator, or authorized service provider.
    2. In the navigation area, expand System Service Aids and click Error/Event Logs. If log entries exist, a list of error and event log entries is displayed in a summary view.
    3. Scroll through the log under Serviceable Customer Attention Events and verify that there is a problem to correspond with the failure.

    For more detailed information on the ASMI, see Managing the Advanced System Management Interface.

    Do you find a serviceable event, or an open problem near the time of the failure?

    • Yes: Continue with the next step.
    • No: Contact your hardware service provider.
  26. The reference code description might provide information or an action that you can take to correct the failure.
    Go to the Reference code finder and type the reference code in the field provided. Read the reference code description and return here. Do not take any other action at this time.

    Was there a reference code description that enabled you to resolve the problem?

    • Yes: This ends the procedure.
    • No: Continue with the next step.
  27. Service is required to resolve the error. Collect as much error data as possible and record it. You and your service provider will develop a corrective action to resolve the problem based on the following guidelines:
    • If a FRU location code is provided in the serviceable event view or control panel, that location should be used to determine which FRU to replace.
    • If an isolation procedure is listed for the reference code in the reference code lookup information, include it as a corrective action even if it is not listed in the serviceable event view or control panel.
    • If any FRUs are marked for block replacement, replace all FRUs in the block replacement group at the same time.
    From the Error Event Log view, complete the following steps:
    1. Record the reference code.
    2. Select the corresponding check box on the log and click Show details.
    3. Record the error details.
    4. Contact your service provider.

    This ends the procedure.



Send feedback Rate this page

Last updated: Thu, September 25, 2014