Troubleshooting
Problem
If the System x3850 X5 / x3950 X5 experiences an unrecoverable memory error and crashes, erroneous Dual In-Line Memory Module (DIMM) over-temperature errors may be logged in the Integrated Management Module (IMM) event log. Also, the TEMP light will be illuminated on the front panel Lightpath card.
Resolving The Problem
Source
RETAIN tip: H202705
Symptom
If the System x3850 X5 / x3950 X5 experiences an unrecoverable memory error and crashes, erroneous Dual In-Line Memory Module (DIMM) over-temperature errors may be logged in the Integrated Management Module (IMM) event log. Also, the TEMP light will be illuminated on the front panel Lightpath card.
Affected configurations
The system may be any of the following IBM servers:
- System x3850 X5, type 7143, any model
- System x3850 X5, type 7145, any model
- System x3850 X5, type 7146, any model
- System x3850 X5, type 7191, any model
- System x3950 X5, type 7145, any model
This tip is not software specific.
This tip is not option specific.
The following system firmware level(s) are affected:
- FPGA
Solution
This behavior will be corrected in a future release of Field Programmable Gate Array (FPGA).
The target date for this release is scheduled for second quarter 2011.
The file is or will be available by selecting the appropriate Product name, Product machine type, and Operating system on IBM Support's Fix Central web page, at the following URL:
| http://www.ibm.com/support/fixcentral/systemx/groupView?query.productGroup=IBM%2FSystemx |
Workaround
When there are indications of both an uncorrected memory error and a DIMM over temperature condition at the same time, only act to resolve the uncorrected memory error. The DIMM over temperature event that occurs at the same time may be ignored.
However, over temperature events that occur at different times, i.e. separated by more than one hour, should be resolved per the Problem Determination and Service Guide.
Additional information
If an uncorrectable memory error occurs, the event log will indicate which affected DIMM(s) generated the error.
As the FPGA is currently monitoring DIMM temperatures on the memory card, these two accesses may possibly cause a condition where a false over-temperature error is detected.
The subsequent DIMM over-temperature errors should be ignored.
A new version of system FPGA will correct this erroneous error condition.
Document Location
Worldwide
Was this topic helpful?
Document Information
Modified date:
30 January 2019
UID
ibm1MIGR-5087329