Troubleshooting
Problem
If either Full Array Memory Mirroring (FAMM) or Hot Add Memory (HAM) memory configuration is selected, one of the following may occur: - Remote Supervisor Adapter II (RSA II) log entry "Power Good Fault detected by memory card x". - Memory board "Error"LED is ON + Memory board "Port Power" LED is OFF + Operator Information Panel "Memory" LED is ON. - POST hangs at B1 checkpoint as observed on video display. - Diagnostic "Active Memory Latch Test" fails. If multiple memory boards are installed, use thistest to determine which boards are affected. If either High Performance Memory Access (HPMA) or Redundant Bit Steering (RBS) memory configuration is selected, one of the following may occur: - POST hangs at B1 checkpoint as observed on video display. -System performance degrades significantly or system freezes.
Resolving The Problem
Source
RETAIN tip: H185846
Symptom
If either Full Array Memory Mirroring (FAMM) or Hot Add Memory (HAM) memory configuration is selected, one of the following may occur:
- Remote Supervisor Adapter II (RSA II) log entry "Power Good Fault detected by memory card x".
- Memory board "Error" LED is ON + Memory board "Port Power" "LED is OFF + Operator Information Panel "Memory" LED is ON.
- POST hangs at B1 checkpoint as observed on video display.
- Diagnostic "Active Memory Latch Test" fails. If multiple memory boards are installed, use this test to determine which boards are affected.
If either High Performance Memory Access (HPMA) or Redundant Bit Steering (RBS) memory configuration is selected, one of the following may occur:
- POST hangs at B1 checkpoint as observed on video display.
- System performance degrades significantly or system freezes.
Affected configurations
The system may be any of the following IBM servers:
- System x3800, type 8865, any model
- System x3800, type 8866, any model
- System x3850, type 8863, any model
- System x3850, type 8864, any model
- System x3950 E, type 8874, any model
- System x3950 E, type 8879, any model
- System x3950, type 8872, any model
- System x3950, type 8878, any model
- xSeries 260, type 8865, any model
- xSeries 366, type 8863, any model
- xSeries 460, type 8872, any model
- xSeries MXE-460, type 8874, any model
This tip is not option specific.
This tip is not software specific.
Solution
Select one of the fixes below depending on the memory configuration:
OPTION 1:
If memory is configured as either HPMA or RBS (default), download and install BIOS (version 1.08 ZUJT47B) and CPLD (version 1.05 HEUD17A) or greater to resolve the issue. While not specifically required for this issue, it is strongly recommended that the BIOS, CPLD, BMC, Diagnostics, and RSA firmwares be updated to the latest version available for the system.
The file will be available from the IBM System x Support web site at the following URL:
Note: Please note that an A/C power is required after updating CPLD to make the update effective.
OPTION 2:
If memory is configured as either FAMM or HAM, the affected memory boards, replacement part number 23K4107, should be replaced with replacement part number 41Y3153.
Additional information
When either FAMM or HAM memory configuration is selected, the Memory board hotplug sensor sensitivity may cause the memory board to be unexpectedly powered off.
If neither FAMM nor HAM is enabled, the Memory board hotplug sensor sensitivity may result in system performance degradation or freezes. During POST, this could result in hangs at checkpoint B1.
Changes were made to the BIOS and CPLD to disable memory board hotplug sensors when either HPMA or RBS is selected to avoid the symptoms from occurring. The Memory board hotplug sensor is not used with these memory configurations. For systems with FAMM or HAM selected, the new BIOS and CPLD enable the hotplug sensor operation, so a change to the memory board hotplug circuit eliminates the symptoms.
As described in RETAIN tip H186774, the Memory Board Latch Test in F2 Diagnostics will only be allowed to run if memory is configured as FAMM or HAM. Additionally, two and only two memory boards should be tested at a time, either in slots 1 and 3, or in slots 2 and 4.
The fix to the memory boards and the fix to the BIOS and CPLD were introduced into Manufacturing before the System x family was released, so newer systems should not be affected in most cases. However, newer systems can be affected if older memory boards were installed POST Manufacturing.
Memory boards with replacement part number 40K0221 and 41Y3153 have the fix for the weak hotplug sensors.
Document Location
Worldwide
Was this topic helpful?
Document Information
Modified date:
29 January 2019
UID
ibm1MIGR-62893