IBM Support

Memory error may not turn on error indicator - IBM eServer xSeries 455

Troubleshooting


Problem

Intermittently, some single-bit or double bit memory errors may not turn on all appropriate failure indicators and/or PFA's due to processing time restrictions within the host processor.

Resolving The Problem

Source

RETAIN tip: H183017
 
Symptom
 
Intermittently, some single-bit or double bit memory errors may not turn on all appropriate failure indicators and/or PFA's due to processing time restrictions within the host processor.
 
One (1) or more PMI timeout messages may be logged in the System Error Log in the Remote Service Adapter.
 
In the case of double bit errors while in mirrored mode, the memory port power will be turned off with possibly no indication of a failing DIMM.
 
Different examples of the SEL with a single bit memory error is shown below with the PMI timeout messages.
 
ERROR LOG:
Index Sev Source Date Time
----- ---- -------- -------- --------
  • WARN SAL/EFI 11/04/04 10:26:49  
    Text: Memory Board Port 1 Single Bit ECC Error Threshold Exceeded: RER1 = 0020000000000000

  • INFO SAL/EFI 11/04/04 10:26:49  
    Text: PMI 15:2 Time=172 ms 1 4/0,0,137,33 0 Max=172/1 SP=0  

  • ERR SERVPROC 11/04/04 10:26:49  
    Text: PFA Alert, see preceding error in system error log.
     
  • WARN SAL/EFI 11/04/04 10:26:49
    Text: SP get led status failed: 18446744073709551600
     
  • ERR SAL/EFI 11/04/04 10:26:49
    Text: 292 - ProteXion: Single bit error threshold exceeded DIMM 1
     
  • INFO SAL/EFI 11/04/04 10:21:03
    Text: PMI 12:2 Time=41 ms 1 4/0,0,0,40 0 Max=50/1 SP=0
     
  • WARN SAL/EFI 11/04/04 10:21:03
    Text: CSG 0 Address 001F3A7100 CaptInfo0 9D3881FF00000001 CaptInfo1 000000000000200F
     
  • WARN SAL/EFI 11/04/04 10:21:03
    Text: Memory Board Port 1 Single Bit ECC Error Threshold Exceeded:
    RER1 = 0020000000000000
     
  • WARN SAL/EFI 11/04/04 10:21:01
    Text: CSG 0 Address 0002E47000 CaptInfo0 723801FF00010100 CaptInfo1
    0000000000002001
     
  • INFO SAL/EFI 11/04/04 10:21:01
    Text: PMI 8:2 Time=50 ms 1 4/0,0,14,35 0 Max=50/1 SP=0
     
  • WARN SAL/EFI 11/04/04 10:21:01
    Text: Memory Board Port 1 Single Bit ECC Error Threshold Exceeded: RER1 = 0020000000000000
     
  • ERR SAL/EFI 11/04/04 10:21:01
    Text: 293 - Test message only!: SBE threshold exceeded, started steering
     
  • INFO SERVPROC 11/04/04 10:17:51
    Text: Static partition is started
     
  • INFO SERVPROC 11/04/04 10:17:51
    Text: Node booting OS
     
  • INFO SERVPROC 11/04/04 10:16:56
    Text: Node booted flash  
Affected configurations

 The system may be any of the following IBM eServers:  

  • xSeries 455, type 8855, any model

The following network operating systems are affected:  

  • Red Hat Enterprise Linux
  • SUSE Linux Enterprise Server for Intel Itanium
  • Windows Server 2003, Enterprise Edition, for 64-Bit Itanium-based systems
  • Windows Server 2003, Datacenter Edition, for 64-Bit Itanium-based systems

Note: This does not imply that the network operating system will work under all combinations of hardware and software. Please see the compatibility page for more information http://www.ibm.com/servers/eserver/serverproven/compat/us/

Solution

There is a planned module update for February 2005 to fix this issue.

Workaround

None

Additional information

In order to meet timing restrictions on the amount of time that may be taken to service a Platform Management Interrupt on Itanium systems, it was necessary to limit the amount of time allowed for transactions between the host processor and the Remote Service Adapter.
 
A side effect of this limitation is that during the processing of some errors, the RSA may not respond to a request within the required timeframe, and the request will be timed out.
 
This action will not affect the normal operation of the machine, but may cause some messages to be truncated and/or some indicators not to be turned on, making the diagnosis of these problems more difficult to some degree.
 
This problem exists in build OYKT25A of the SAL/EFI firmware.


Document Location

Worldwide

Operating System

Older System x:Operating system independent / None

[{"Type":"HW","Business Unit":{"code":"BU016","label":"Multiple Vendor Support"},"Product":{"code":"HW21F","label":"Older System x->xSeries 455"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Line of Business":{"code":"","label":""}}]

Document Information

Modified date:
29 January 2019

UID

ibm1MIGR-57688