Troubleshooting
Problem
A small percentage of IBM System x3650 M4 on board Redundant Array of Independent Disks (RAID) controller, ServeRAID M5110 might experience early life online resets during heavy Input/Output (I/O). On susceptible controllers, the frequency of online resets varies depending on I/O throughput. This issue has been observed to occur during a virtual disk consistency check, and a hard drive patrol read. A momentary loss of performance for a few seconds can be observed while the controller resets itself. Thefirmware level does not contribute to the online resets. This is a recoverable event and has no impact on data. Data is stored in the controller's flash-based memory, and is off-loaded when the reset completes. However, the system board needs to be replaced if either of following conditions have occurred: Text has been truncated due to size limiations.
Resolving The Problem
Source
RETAIN tip: H211741
Symptom
A small percentage of IBM System x3650 M4 on board Redundant Array of Independent Disks (RAID) controller, ServeRAID M5110 might experience early life online resets during heavy Input/Output (I/O). On susceptible controllers, the frequency of online resets varies depending on I/O throughput.
This issue has been observed to occur during a virtual disk consistency check, and a hard drive patrol read. A momentary loss of performance for a few seconds can be observed while the controller resets itself. The firmware level does not contribute to the online resets.
This is a recoverable event and has no impact on data. Data is stored in the controller's flash-based memory, and is off-loaded when the reset completes.
However, the system board needs to be replaced if either of following conditions have occurred:
- It is normal to see a controller reset for some tasks. If more
than five unexplained controller resets occur per hour, with no
'PMU Msg' fault code logged, the system board needs to be replaced.
Controller encountered a fatal error and reset - Users can identify the error from the following MegaRAID
Storage Manager event:
In Linux, look for the following messages in the /var/log/kernel log file:
kernel megasas: Found firmware in FAULT state, will reset adapter.
kernel megaraid_sas: resetting fusion adapter.
kernel megasas: Waiting for firmware to come to ready state
kernel megasas: firmware now in Ready state
kernel megasas: IOC Init cmd success
kernel megaraid_sas: Reset successfulIf the previous event is observed, view the RAID controller's firmware log to see if it contains one of the following messages:
- Pmu Msg Fault!!! faultcode 00002651
- Pmu Msg Fault!!! faultcode 00002653
- Pmu Msg Fault!!! faultcode 00002656
- Pmu Msg Fault!!! faultcode 0000265D
- Pmu Msg Fault!!! faultcode 00000615
- Pmu Msg Fault!!! faultcode 00001900
- Pmu Msg Fault!!! faultcode 00002665To check the controller's firmware log, download the IBM MegaRAID Command Line Interface (MegaCLI) and run the following command:
MegaCLI -FwTermLog -Dsply -aALL
MegaCLI for Microsoft Windows:
MIGR-5082326.htmlMegaCLI for Linux:
MIGR-5082327.html
Under some conditions, an event such as the following event could be logged in the firmware log instead of the 'Pmu Msg' fault.
| - To understand..pmsg:c130adf8 lmid: |
Affected configurations
The system can be any of the following IBM servers:
- System x3650 M4, type 7915, any model
This tip is not software specific.
This tip is not option specific.
The system has the symptom described above.
Solution
If the symptoms listed are observed, replace the system board with controller embedded with the following Field Replaceable Unit (FRU) Part Number (Part Number): 00AM209 (for Intel V2 CPU) or 00Y8457 (for Intel CPU).
Additional information
An online reset is defined as the controller resets its firmware. This process takes only a few seconds. Early life implies this is an event that will occur right away, and controller age has no impact.
If RAID controller M51XX adapters have any failure symptoms that meet the listed description, refer to RETAIN Tip H21381 (ServeRAID M51XX CONTROLLERS MAY RESET DURING HEAVY I/O).
Document Location
Worldwide
Was this topic helpful?
Document Information
Modified date:
30 January 2019
UID
ibm1MIGR-5094459