IBM Support

Abnormal PCI error handling behavior when LER is disabled - IBM System x3850 X6 (3837, 3839)

Troubleshooting


Problem

When the setup menu option 'Live Error Recovery' (LER) is changed from the default to disabled and an uncorrectable Peripheral Component Interconnect (PCI) error occurs: The operating System freezes (locks up, hangs) for a period of time The operating System eventually restarts The system does not report or log an issue. This is not the intended behavior for this type of incident when the setting is disabled.

Resolving The Problem

Source

RETAIN tip: H212079

Symptom

When the setup menu option 'Live Error Recovery' (LER) is changed from the default to disabled and an uncorrectable Peripheral Component Interconnect (PCI) error occurs:

  • The operating System freezes (locks up, hangs) for a period of time
  • The operating System eventually restarts
  • The system does not report or log an issue.

This is not the intended behavior for this type of incident when the setting is disabled.

Affected configurations

The system can be any of the following IBM servers:

This tip is not software specific.

This tip is not option specific.

The following system BIOS or UEFI levels are affected: Build ID:

The system has the symptom described above.

Solution

This behavior has been corrected in a current UEFI firmware release ibm_fw_uefi_a8e108m-1.00_anyos_32-64.

The file is available by selecting the appropriate Product Group, type of System, Product name, Product machine type, and Operating system on IBM Support's Fix Central web page, at the following URL:

http://www.ibm.com/support/fixcentral/

Workaround

IBM strongly advises against disabling the LER setting. To re-enable the setting perform the following steps:

  1. Press F1 during Power On Self-Test (POST) to enter the UEFI F1 Setup menu.
  2. Select System Settings -> Recovery and RAS -> Live Error Recovery -> Enable.
  3. Select Save settings.
  4. Exit the UEFI F1 Setup menu.
  5. Use system with this option enabled.

Additional information

There is no reason to disable the LER setting.

This error occurs as a result of disabling the LER setting and a code defect. When disabled, the card error crashes the system before an interrupt can be generated. This is not the intended behavior when this type of fault occurs with LER disabled.

Live Error Recovery is enabled by default. When enabled, it automatically disables the faulty Peripheral Component Interconnect Express (PCIe) card and sends an interrupt to the UEFI error handler. The handler will cause a 'blue screen' (Microsoft Windows critical error) and a graceful restart, which is the expected behavior.

Document Location

Worldwide

Operating System

System x:Operating system independent / None

[{"Type":"HW","Business Unit":{"code":"BU016","label":"Multiple Vendor Support"},"Product":{"code":"QUOEVPK","label":"System x->System x3850 X6->3837"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Line of Business":{"code":"","label":""}},{"Type":"HW","Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"QUOFECN","label":"System x->System x3850 X6->3839"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Line of Business":{"code":"","label":""}}]

Document Information

Modified date:
30 January 2019

UID

ibm1MIGR-5094752