IBM Support

Uncorrectable memory errors generate false DIMM over-temperature errors - IBM System x3850 X5, x3950 X5

Troubleshooting


Problem

If the System x3850 X5 / x3950 X5 experiences an unrecoverable memory error and crashes, erroneous Dual In-Line Memory Module (DIMM) over-temperature errors may be logged in the Integrated Management Module (IMM) event log. Also, the TEMP light will be illuminated on the front panel Lightpath card.

Resolving The Problem

Source

RETAIN tip: H202705

Symptom

If the System x3850 X5 / x3950 X5 experiences an unrecoverable memory error and crashes, erroneous Dual In-Line Memory Module (DIMM) over-temperature errors may be logged in the Integrated Management Module (IMM) event log. Also, the TEMP light will be illuminated on the front panel Lightpath card.

Affected configurations

The system may be any of the following IBM servers:

  • System x3850 X5, type 7143, any model
  • System x3850 X5, type 7145, any model
  • System x3850 X5, type 7146, any model
  • System x3850 X5, type 7191, any model
  • System x3950 X5, type 7145, any model

This tip is not software specific.

This tip is not option specific.

The following system firmware level(s) are affected:

  • FPGA

Solution

This behavior will be corrected in a future release of Field Programmable Gate Array (FPGA).

The target date for this release is scheduled for second quarter 2011.

The file is or will be available by selecting the appropriate Product name, Product machine type, and Operating system on IBM Support's Fix Central web page, at the following URL:

  http://www.ibm.com/support/fixcentral/systemx/groupView?query.productGroup=IBM%2FSystemx

Workaround

When there are indications of both an uncorrected memory error and a DIMM over temperature condition at the same time, only act to resolve the uncorrected memory error. The DIMM over temperature event that occurs at the same time may be ignored.

However, over temperature events that occur at different times, i.e. separated by more than one hour, should be resolved per the Problem Determination and Service Guide.

Additional information

If an uncorrectable memory error occurs, the event log will indicate which affected DIMM(s) generated the error.

As the FPGA is currently monitoring DIMM temperatures on the memory card, these two accesses may possibly cause a condition where a false over-temperature error is detected.

The subsequent DIMM over-temperature errors should be ignored.

A new version of system FPGA will correct this erroneous error condition.

Document Location

Worldwide

Operating System

System x:Operating system independent / None

[{"Type":"HW","Business Unit":{"code":"BU016","label":"Multiple Vendor Support"},"Product":{"code":"QU04SRF","label":"System x->System x3850 X5->7146"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Line of Business":{"code":"","label":""}},{"Type":"HW","Business Unit":{"code":"BU016","label":"Multiple Vendor Support"},"Product":{"code":"QU04SRO","label":"System x->System x3850 X5->7145"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Line of Business":{"code":"","label":""}},{"Type":"HW","Business Unit":{"code":"BU016","label":"Multiple Vendor Support"},"Product":{"code":"QU04SZB","label":"System x->System x3950 X5->7145"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Line of Business":{"code":"","label":""}},{"Type":"HW","Business Unit":{"code":"BU016","label":"Multiple Vendor Support"},"Product":{"code":"QU90ABO","label":"System x->System x3850 X5->7191"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Line of Business":{"code":"","label":""}},{"Type":"HW","Business Unit":{"code":"BU016","label":"Multiple Vendor Support"},"Product":{"code":"QU90ABX","label":"System x->System x3850 X5->7143"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Line of Business":{"code":"","label":""}}]

Document Information

Modified date:
30 January 2019

UID

ibm1MIGR-5087329