IBM Support

High CPU utilization on EN4093 switch

Troubleshooting


Problem

If two EN4093 switches are connected to separate upstream switches via an Link Aggregation Control Protocol (LACP)/Static connection or even a single link and the EN4093 switches are configured with 25 or more Server Time Protocol (STP) groups with one Virtual Local Area Network (VLAN) each, then any network change on the link between the EN4093 and its upstream partner can cause the EN4093 Central Processing Unit (CPU) utilization to spike to 100 percent. This can be triggered when the link between those switches are physically made, if the link flaps, if a root bridge flaps, or if a port channel flaps. This condition persists for maximum of 20-25 minutes.

Resolving The Problem

Source

RETAIN tip: H21431

Symptom

If two EN4093 switches are connected to separate upstream switches via an Link Aggregation Control Protocol (LACP)/Static connection or even a single link and the EN4093 switches are configured with 25 or more Server Time Protocol (STP) groups with one Virtual Local Area Network (VLAN) each, then any network change on the link between the EN4093 and its upstream partner can cause the EN4093 Central Processing Unit (CPU) utilization to spike to 100 percent.

This can be triggered when the link between those switches are physically made, if the link flaps, if a root bridge flaps, or if a port channel flaps.

This condition persists for maximum of 20-25 minutes.

Affected configurations

The system is configured with one or more of the following IBM Option part numbers:

  • IBM Flex System Fabric EN4093 10 Gb Scalable Switch, any model

This tip is not system specific.

This tip is not software specific.

The system has the symptom described above.

Solution

This behavior will be corrected in the GA4 release late August 2013, and in a 7.5.5 release expected to exit test in early July 2013.

The target date for this release is scheduled for third quarter 2013.

The file is or will be available by selecting the appropriate Product Group, type of System, Product name, Product machine type, and operating system on IBM Support's Fix Central web page, at the following URL:

 

http://www.ibm.com/support/fixcentral/

Workaround


The workaround is to use the command "no logging log spanning-tree-group" on both EN4093s.

This stops the CPU from writing STP changes. Once this command is applied, Bridge Protocol Data Units (BPDUs)and Link Aggregation Control Protocol Data Units (LACPDUs) are processed in a timely manner, and the issue is not observed.

Additional information

When the situation described previously occurs, the EN4093s are swamped with multiple STP 'new root bridge' and 'topology change' notification syslogs for each VLAN.

As a result, the CPU utilization will spike to 100 percent. The CPU will no longer be able to process LACPDUs and BPDUs in a timely manner, causing LACP, STP and control plane protocols to flap.

This condition persists for maximum of 20-25 minutes.

Document Location

Worldwide

Operating System

PureFlex System and Flex System:Operating system independent / None

[{"Type":"HW","Business Unit":{"code":"BU050","label":"BU NOT IDENTIFIED"},"Product":{"code":"HW949","label":"PureFlex System and Flex System->Fabric Manager"},"Platform":[{"code":"PF025","label":"Platform Independent"}],"Line of Business":{"code":"LOB18","label":"Miscellaneous LOB"}}]

Document Information

Modified date:
30 January 2019

UID

ibm1MIGR-5093222