Troubleshooting
Problem
When a double chipkill error occurs on Dual In-Line Memory Module (DIMM) slots 17-24 of the optional 16-DIMM memory tray, the error might not be reported. The following are the x4 Registered DIMMs (RDIMMs) affected: IBM Part Number: 49Y1400 16 GB (Quad-Rank x4) 1.35 V PC3L-8500R LP RDIMM IBM Part Number: 49Y1563 16 GB (Dual-Rank x4) 1.35 V PC3L-10600 CL9 ECC DDR3 1333 MHz IBM Part Number: 90Y3101 32 GB (Quad-Rank x4) 1.35 V PC 3L-8500 CL7 ECC DDR3 1066 MHz LP RDIMM The symptom occurs when using the following option board: Option Part Number: 81Y8926 - IBM System x3690 X5 16-DIMM Internal MB2 Memory Expansion
Resolving The Problem
Source
RETAIN tip: H206216
Symptom
When a double chipkill error occurs on Dual In-Line Memory Module (DIMM) slots 17-24 of the optional 16-DIMM memory tray, the error might not be reported.
The following are the x4 Registered DIMMs (RDIMMs) affected:
- IBM Part Number: 49Y1400 16 GB (Quad-Rank x4) 1.35 V PC3L-8500R LP RDIMM
- IBM Part Number: 49Y1563 16 GB (Dual-Rank x4) 1.35 V PC3L-10600 CL9 ECC DDR3 1333 MHz
- IBM Part Number: 90Y3101 32 GB (Quad-Rank x4) 1.35 V PC 3L-8500 CL7 ECC DDR3 1066 MHz LP RDIMM
The symptom occurs when using the following option board:
- Option Part Number: 81Y8926 - IBM System x3690 X5 16-DIMM Internal MB2 Memory Expansion
Affected Configurations
The system can be any of the following IBM servers:
- System x3690 X5, type 7147, any model
- System x3690 X5, type 7192, any model
The system is configured with one or more of the following IBM options:
- 16 GB (Quad-Rank x 4) PC3-8500 CL7 ECC DDR3 1066 MHz LP RDIMM, option 46C7483, any replacement part number
This tip is not software specific.
The following system BIOS/UEFI level is affected:
- UEFI 2nd Quarter 2012 LFC Release
The system has the symptom described above.
Solution
This behavior was addressed in Unified Extensible Firmware Interface (UEFI) version 1.77 Build ID: MLE177A.
The file is available by selecting the appropriate Product Group, type of System, Product name, Product machine type, and Operating system on IBM Support's Fix Central web page, at the following URL
| http://www.ibm.com/support/fixcentral/ |
Additional Information
A double chipkill error is a correctable error, so the server will continue to operate when the error occurs. Double chipkill also is referred to as Double Device Data Correction (DDDC).
DDDC is applied only when X4 DIMMs are installed under a Westmere EX Central Processing Unit (CPU). DDDC works in the following way:
- One Dynamic Random Access Memory (DRAM) device on DIMM is reserved as spare device. When the first DRAM device fails, its content is read, corrected and written to the spare DRAM device.
- When a second DRAM device failure occurs, the data is corrected by Error Correction Code (ECC) and a correctable errors count accumulates. When a set threshold is reached, a memory Predictive Failure Analysis (PFA) warning event is logged.
During the process of the second DRAM device failure, the
correctable error threshold for some slots on the optional 16-DIMM
memory tray does not accumulate, and the threshold is never
reached. Therefore, the corresponding PFA warning event does not
occur and is not logged.
Document Location
Worldwide
Was this topic helpful?
Document Information
Modified date:
30 January 2019
UID
ibm1MIGR-5091087