You can use problem analysis to gather information that
helps you determine the nature of a problem encountered on your system.
This information is used to determine if you can resolve the problem
yourself or to gather sufficient information to communicate with a
service provider and quickly determine the service action that needs
to be taken.
The method of finding and collecting error information depends
on the state of the hardware at the time of the failure. This procedure
directs you to one of the following places to find error information:
- The management console error
logs
- The operating system's error log
- The control panel
- The Integrated Virtualization Manager
- The Advanced System Management Interface (ASMI) error logs
- Light path diagnostics
If you are using this information because
of a problem with your Hardware Management Console (HMC), see Managing the HMC.
If you are using this information because of
a problem with your IBM® Systems Director Management Console (SDMC), see Managing the SDMC.
To begin analyzing the problem, complete the following steps:
- Did you observe an activated LED on your system unit
or expansion unit? To view an example of the control panel LEDs, see Control panel LEDs.
- Yes: Continue with the next step.
- No: Go to step 7.
|
- Was the activated LED on the system unit?
- Yes: Continue with the next step.
- No: The activated LED is on an expansion unit that is connected
to the system unit. Go to step 4.
|
- Is the activated LED the system information light (designated
by an i)?
- Yes: Go to step 7.
- No: Continue with the next step.
|
- Is the activated LED the enclosure fault
indicator (designated by an !)?
- Yes: Use Light
Path diagnostics to identify and service the failing part.
Continue with the next step.
- No: Go to step 7.
|
- The reference code description might provide information or an
action that you can take to correct the failure.
| Use the information
center search function to find the reference code details. The information
center search function is located in the upper left corner of this
information center. Read the reference code description and return
here. Do not take any other action at this time. Was there a reference
code description that enabled you to resolve the problem?
|
- Yes: This ends the procedure.
- No: Continue with the next step.
|
- In the serviceable event view of the error, record the part number
and location code of the first field-replaceable unit (FRU). Other
FRUs might be listed but the first FRU has a high probability of resolving
the problem. When you have identified the first FRU in the list, contact
your service provider to obtain a replacement part. Do not remove
power to the unit until you are ready to exchange the FRU with a replacement
FRU.
- Are all system units and expansion units powered
on or are you able to power them on?
Note: An enclosure is powered
on when its green power indicator is on and not flashing.
- Yes: Go to step 9.
- No: Continue with the next step.
|
- Ensure that the power supplied to the system is adequate. If your
processor enclosures and I/O enclosures are protected by an emergency
power off (EPO) circuit, check that the EPO switch is not activated.
Verify that all power cables are correctly connected to the electrical
outlet. When power is available, the Function/Data display on the
control panel is lit. If you have an uninterruptible power supply,
verify that the cables are correctly connected to the system, and
that it is functioning. Power on all processor and I/O enclosures.
Did all enclosures power on?
Note: An
enclosure is powered on when its green power indicator is on and not
blinking. In a single-enclosure server with a redundant service
processor, a progress code displays on the control (operator) panel
several seconds after ac power is first applied. This progress code
remains on the control panel for 1-2 minutes, then the progress code
is updated every 20-30 seconds as the system powers on.
In a
multiple-enclosure server with a redundant service processor, a progress
code does not display on the control (operator) panel until 1-2 minutes
after ac power is first applied. After the first progress code displays,
the progress code is updated every 20-30 seconds as the system unit
powers on.
|
- Yes: This ends the procedure.
- No: Continue with the next step.
|
- Is the failing hardware managed by a management console?
- Yes: Go to step 18.
- No: Continue with the next step.
|
- Is your system being managed by the Integrated Virtualization Manager?
- Yes: Go to step 22.
- No: Continue with the next step. Refer to the appropriate
procedure:
|
- If an operating system was running at the time
of the failure, information about the failure is found in the operating
system's serviceable event view unless the failure prevented the operating
system from doing so. If that operating system is no longer running,
attempt to reboot it before answering the following question.
Was an operating system running at the time
of the failure and is the operating system running now?
|
- Yes: Go to step 17.
- No: Continue with the next step.
|
- Details about errors that occur when an operating system is not
running or is now not accessible can be found in the control panel
or in the Advanced System Management Interface (ASMI).
Do you choose to look for error details
using ASMI?
|
- Yes: Go to step 25.
- No: Continue with the next step.
|
- At the control panel, complete the following steps.
- Press the increment or decrement button until the number 11 is
displayed in the upper-left corner of the display.
- Press Enter to display the contents of
function 11.
- Look in the upper-right corner for a reference code.
Is there a reference code displayed on the control panel
in function 11?
|
- Yes: Continue with the next step.
- No: Contact your hardware service provider.
|
- The reference code description might provide information or an
action that you can take to correct the failure.
| Go to the Reference code finder and type the reference code
in the field provided. Read the reference code description and return
here. Do not take any other action at this time. Was there a reference
code description that enabled you to resolve the problem?
|
- Yes: This ends the procedure.
- No: Continue with the next step.
|
- Service is required to resolve the error. Collect as much error
data as possible and record it. You and your service provider will
develop a corrective action to resolve the problem based on the following
guidelines:
- If a field-replaceable unit (FRU) location code is provided in
the serviceable event view or control panel, that location should
be used to determine which FRU to replace.
- If an isolation procedure is listed for the reference code in
the reference code lookup information, include it as a corrective
action even if it is not listed in the serviceable event view or control
panel.
- If any FRUs are marked for block replacement, replace all FRUs
in the block replacement group at the same time.
To find error details:
- Press Enter to display the contents of
function 14. If data is available in function 14, the reference code
has a FRU list.
- Record the information in functions 11 through 20 on the control
panel.
- Contact your service provider and report the reference code and
other information.
This ends the procedure.
|
- Is your system being managed by the Integrated
Virtualization Manager?
Note: If you install the Virtual I/O Server
on a system unit that is not managed by a management console, then the Integrated
Virtualization Manager is enabled.
- Yes: Go to step 22.
- No: Continue with the next step.
|
- If you are having a problem with an AIX, Linux, or IBM i system unit, or InfiniBand switch,
go to the appropriate procedure.
- Is the management console functional and connected
to the hardware?
- Yes: Continue with the next step.
- No: Start the management console and
attach it to the system unit. Then return here and continue with the
next step.
|
- On management console that
is used to manage the system unit, complete the following steps:
Note: If you are unable to
locate the reported problem, and there is more than one open problem
near the time of the reported failure, use the earliest problem in
the log.
For HMC: - In the navigation area, click .
The Manage Serviceable Events - Select Serviceable Events window is
shown.
- In the Event Criteria area, for Serviceable Event Status, select Open.
For all other criteria, select ALL, then click OK.
For SDMC: - On the Service and Support Manager page, select Serviceable
Problems from the Electronic Services Links list
box.
Tip: The Serviceable Problems pane displays a filtered
list of only those problems associated with systems that are monitored
by the Service and Support Manager.
- Click the problem listed in the Name column
that you want to work with. This step displays the properties of the
selected problem.
Scroll through the log and verify that there is a problem
with the status of Open to correspond with the failure.
Do
you find a serviceable event, or an open problem near the time of
the failure?
|
- The reference code description might provide information or an
action that you can take to correct the failure.
| Go to the Reference code finder and type the reference code
in the field provided. Read the reference code description and return
here. Do not take any other action at this time. Was there a reference
code description that enabled you to resolve the problem?
|
- Yes: This ends the procedure.
- No: Continue with the next step.
|
- Service is required to resolve the error. Collect as much error
data as possible and record it. You and your service provider will
develop a corrective action to resolve the problem based on the following
guidelines:
- If a FRU location code is provided in the serviceable event view
or control panel, that location should be used to determine which
FRU to replace.
- If an isolation procedure is listed for the reference code in
the reference code lookup information, include it as a corrective
action even if it is not listed in the serviceable event view or control
panel.
- If any FRUs are marked for block replacement, replace all FRUs
in the block replacement group at the same time.
From the Repair Serviceable Event window, complete the following
steps: - Record the problem management record (PMR) number for the problem
if one is listed.
- Select the serviceable event from the list.
- Select Selected and View Details.
- Record the reference code and FRU list found in the Serviceable
Event Details.
- If a PMH number was found for the problem on the Serviceable Event
Overview panel, the problem has already been reported. If there was
no PMH number for the problem, contact your service provider.
This ends the procedure.
|
- Log on to the Integrated Virtualization Manager
interface if not already logged on.
- In the Integrated Virtualization Manager Navigation bar, select Manage
Serviceable Events (under Service Management).
- Scroll through the log and verify that there is a problem with
status as Open to correspond with the failure.
Do you find a serviceable event, or an open problem near the
time of the failure?
|
- Yes: Continue with the next step.
- No: Go to step 17.
|
- The reference code description might provide information or an
action that you can take to correct the failure.
| Go to the Reference code finder and type the reference code
in the field provided. Read the reference code description and return
here. Do not take any other action at this time. Was there a reference
code description that enabled you to resolve the problem?
|
- Yes: This ends the procedure.
- No: Continue with the next step.
|
- Service is required to resolve the error. Collect as much error
data as possible and record it. You and your service provider will
develop a corrective action to resolve the problem based on the following
guidelines:
- If a FRU location code is provided in the serviceable event view
or control panel, that location should be used to determine which
FRU to replace.
- If an isolation procedure is listed for the reference code in
the reference code lookup information, it should be included as a
corrective action even if it is not listed in the serviceable event
view or control panel.
- If any FRUs are marked for block replacement, all FRUs in the
block replacement group should be replaced at the same time.
From the Selected Serviceable Events table, complete the following
steps.- Record the reference code.
- Select the serviceable event.
- Select View Associated FRUs.
- Contact your service provider
This ends the procedure.
|
- On the console connected to the ASMI, complete the
following steps.
Note: If you are unable to locate the reported
problem, and there is more than one open problem near the time of
the reported failure, use the earliest problem in the log.
- Log in with a user ID that has an authority level as general,
administrator, or authorized service provider.
- In the navigation area, expand System Service Aids and
click Error/Event Logs. If log entries exist,
a list of error and event log entries is displayed in a summary view.
- Scroll through the log under Serviceable Customer Attention
Events and verify that there is a problem to correspond
with the failure.
For more detailed information on the
ASMI, see Managing the Advanced System Management
Interface.
Do you find a serviceable event, or an open
problem near the time of the failure?
|
- Yes: Continue with the next step.
- No: Contact your hardware service provider.
|
- The reference code description might provide information or an
action that you can take to correct the failure.
| Go to the Reference code finder and type the reference code
in the field provided. Read the reference code description and return
here. Do not take any other action at this time. Was there a reference
code description that enabled you to resolve the problem?
|
- Yes: This ends the procedure.
- No: Continue with the next step.
|
- Service is required to resolve the error. Collect as much error
data as possible and record it. You and your service provider will
develop a corrective action to resolve the problem based on the following
guidelines:
- If a FRU location code is provided in the serviceable event view
or control panel, that location should be used to determine which
FRU to replace.
- If an isolation procedure is listed for the reference code in
the reference code lookup information, include it as a corrective
action even if it is not listed in the serviceable event view or control
panel.
- If any FRUs are marked for block replacement, replace all FRUs
in the block replacement group at the same time.
From the Error Event Log view, complete the following steps:- Record the reference code.
- Select the corresponding check box on the log and click Show
details.
- Record the error details.
- Contact your service provider.
This ends the procedure.
|