IBM Support

DIMM over-temperature errors resulting from improper servicing - IBM System x3690 X5

Troubleshooting


Problem

After re-installing the top cover and powering on, the server may begin to report numerous memory module over-temperature asserting and de-asserting events, along with the Light Path Diagnostics (LPD) panel "TEMP" Light Emitting Diode (LED) and fan speeds fluctuating. An example of a Dynamic System Analysis (DSA) chassis event showing top cover removal without first removing AC power, followed by Dual In-Line Memory Module (DIMM) over-temperature: CHASSIS:(09/28/2011 19:51:43) Sensor "Cover Open Fault" has transitioned to critical from a less severe state An example of a DSA chassis event indicating DIMM over-temperature: CHASSIS:(09/28/2011 20:03:41) An Over-Temperature Condition has been detected on the "Mem Bank " on subsystem "System Memory" This occurs when all of the following conditions are met: The top cover was removed and AC power was not removed. Any of the following Micron DIMMs with date code 1048 and older are installed.

Resolving The Problem

Source

RETAIN tip: H204015

Symptom

After re-installing the top cover and powering on, the server may begin to report numerous memory module over-temperature asserting and de-asserting events, along with the Light Path Diagnostics (LPD) panel "TEMP" Light Emitting Diode (LED) and fan speeds fluctuating.

An example of a Dynamic System Analysis (DSA) chassis event showing top cover removal without first removing AC power, followed by Dual In-Line Memory Module (DIMM) over-temperature:

  CHASSIS:(09/28/2011 19:51:43) Sensor "Cover Open Fault" has transitioned to critical from a less severe state

An example of a DSA chassis event indicating DIMM over-temperature:

  CHASSIS:(09/28/2011 20:03:41) An Over-Temperature Condition has been detected on the "Mem Bank <number>" on subsystem "System Memory"

This occurs when all of the following conditions are met:

  1. The top cover was removed and AC power was not removed.
  2. Any of the following Micron DIMMs with date code 1048 and older are installed.
  • 2GB Micron Part Number MT18JSF25672PDY-1G4D1 (IBM Part Number 43X5045, replacement part number 44T1491)
  • 2GB Micron Part Number MT18JSF25672PDZ-1G4F1 (IBM Part Number 43X5045/47J0154, replacement part number 44T1491)
  • 4GB Micron Part Number MT36JSZF51272PDZ-1G1F1 (IBM Part Number 43X5055, replacement part number 46C7452)
  • 8GB Micron Part Number MT36JSZF1G72PDZ-1G1D1 (IBM Part Number 43X5070, replacement part number 46C7488) DIMMs

Note: The top cover removal event may not be found in the logs if the logs have been cleared.

Affected configurations

The system may be any of the following IBM servers:

  • System x3690 X5, type 7147, any model
  • System x3690 X5, type 7148, any model
  • System x3690 X5, type 7149, any model
  • System x3690 X5, type 7192, any model

This tip is not software specific.

This tip is not option specific.

Workaround

Follow these steps:

  1. DC power-down.
  2. Remove AC power from the server that is experiencing the over-temperature events for a minimum of thirty (30) seconds.
  3. Reapply AC power.

Additional information

This problem is due to a combination of the following two (2) factors:

  • System design.
  • Errata for the DIMM Electrically Erasable Programmable Read-Only Memory (EEPROM)/Thermal Sensor used on this vintage of DIMM, provided by Micron. Under certain conditions the initialization of the EEPROM/Thermal Sensor module is not met. One such event occurs when AC power is not removed prior to removing the top cover, as required for this product. Subsequent reads from the device can be erroneous and result in over-temperatures being reported.

Removing AC power allows the sensor to properly initialize.

Document Location

Worldwide

Operating System

System x:Operating system independent / None

[{"Type":"HW","Business Unit":{"code":"BU016","label":"Multiple Vendor Support"},"Product":{"code":"QU04WDX","label":"System x->System x3690 X5->7149"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Line of Business":{"code":"","label":""}},{"Type":"HW","Business Unit":{"code":"BU016","label":"Multiple Vendor Support"},"Product":{"code":"QU04WDY","label":"System x->System x3690 X5->7148"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Line of Business":{"code":"","label":""}},{"Type":"HW","Business Unit":{"code":"BU016","label":"Multiple Vendor Support"},"Product":{"code":"QU90ABQ","label":"System x->System x3690 X5->7147"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Line of Business":{"code":"","label":""}},{"Type":"HW","Business Unit":{"code":"BU016","label":"Multiple Vendor Support"},"Product":{"code":"QU90ACM","label":"System x->System x3690 X5->7192"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Line of Business":{"code":"","label":""}}]

Document Information

Modified date:
30 January 2019

UID

ibm1MIGR-5088909