IBM Support

Nortel Gigabit Ethernet Switch Module "Current Fault" event at power off - IBM eServer BladeCenter (Type 8677)

Troubleshooting


Problem

An erroneous switch fault logged in the BladeCenter Management Module (MM) Event Log can intermittently occur when the entire BladeCenter chassis is powered off.

Resolving The Problem

Source

RETAIN tip H18518

Symptom

An erroneous switch fault logged in the BladeCenter Management Module (MM) Event Log can intermittently occur when the entire BladeCenter chassis is powered off. You see this message in the MM event log: "I/O module x Current Fault" and the timestamp of the event is associated with a BladeCenter chassis power off.
 
Example: The site power goes down and there is no backup supply. The entire BladeCenter chassis loses power, which sets up the potential for the erroneous log entry.

Affected configurations

The system is an IBM eServer BladeCenter, type 8677, any model.

The firmware levels affected are:

  • GbESM-AOS-20.0.1.1 and later
Solution

Ignore the Management Module Event Log entry; it is NOT a real switch fault.

Additional information

This anomaly occurred two (2) times out of approximately 300 BladeCenter (BC) chassis power cycles during Extended Life Performance (ELP) testing. From the event log timestamps it is clear that the event is a byproduct of the sequence involved as power is removed from the chassis.
 
The "Current Fault" event is actually a category of fails which include an undervoltage condition. On the switch there are two (2) power "domains" that exist. The first domain (PD1) activates when the switch module is plugged into the chassis and is used to allow basic management-type communication between the switch and the Management Module (MM). Once the MM verifies the presence of a valid switch, the MM instructs the switch to enter into the second power domain (PD2), which allows the switch to become fully operational. Detection circuitry (on the switch) begins to monitor for any undervoltage conditions sensed on PD2 and signals the appropriate status if detected. The MM routinely polls the switch for new status and if a fault has occurred, logs that information in the Event Log.
 
Both of these power domains are derived from the 'raw' 12V supply generated from the BC Power Supply modules. When chassis power is removed, the 12V supply does not decay INSTANTLY to 0V. As the 12V input drops to around 9V, the undervoltage condition is detected on PD2 and status logged. Because PD1 remains active until the input supply drops to around 5V, there is a finite amount of time (potentially tens or even hundreds of milliseconds) in which the MM could be polling for status, see the fault indication, and write it into the Event Log before it (the MM) has completely lost power.

This is a real undervoltage condition (induced by removal of power from the chassis) that is correctly sensed by the detection circuit in the switch, correctly indicated by the switch status, and correctly posted by the MM. It is NOT, however, a real switch fault, but unfortunately neither the switch nor the MM can distinguish it as such due to the power and management architecture of BladeCenter.

Document Location

Worldwide

Operating System

BladeCenter:Operating system independent / None

[{"Type":"HW","Business Unit":{"code":"BU016","label":"Multiple Vendor Support"},"Product":{"code":"HW20T","label":"BladeCenter E Chassis"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Line of Business":{"code":"","label":""}}]

Document Information

Modified date:
29 January 2019

UID

ibm1MIGR-53230