Troubleshooting
Problem
IBM Flex System Compute Node X240 machine types 8737, 8738, or X220 machine types 7906, 7864, and 2585, show errors in the Integrated Management Module II (IMM2) or Chassis Management Module (CMM) event log stating communications is off line. Errors could resemble, but are not limited to: node02- Not reading device on system management (I2C) bus xx. Node node02 communication is offline. node02- Recovered reading device on system management (I2C) bus xx. Fans may ramp up to 100 percent. The errors also may contain Event IDs 0x50020603, 0x50020604, or 0E008002. The compute node is unmanageable. One (1) or more of the compute nodes may remain in the 'Comm Error' or 'Init Failed' state indefinitely. Any fault that indicates that an I2C faultor that communication between the CMM and the Information Technology Element (ITE) IMM2 is off line or has been off line is of concern and should be addressed as listed below.
Resolving The Problem
Source
RETAIN tip: H205897
Symptom
IBM Flex System Compute Node X240 machine types 8737, 8738, or X220 machine types 7906, 7864, and 2585, show errors in the Integrated Management Module II (IMM2) or Chassis Management Module (CMM) event log stating communications is off line.
Errors could resemble, but are not limited to:
|
node02- Not reading device on system management (I2C) bus xx. Node node02 communication is offline. node02- Recovered reading device on system management (I2C) bus xx. |
Fans may ramp up to 100 percent.
The errors also may contain Event IDs 0x50020603, 0x50020604, or 0E008002.
The compute node is unmanageable. One (1) or more of the compute nodes may remain in the 'Comm Error' or 'Init Failed' state indefinitely.
Any fault that indicates that an I2C fault or that communication between the CMM and the Information Technology Element (ITE) IMM2 is off line or has been off line is of concern and should be addressed as listed below.
Affected configurations
The system may be any of the following IBM servers:
- Flex System x220 Compute Node, type 2585, any model
- Flex System x220 Compute Node, type 7864, any model
- Flex System x220 Compute Node, type 7906, any model
- Flex System x240 Compute Node, type 8737, any model
The system is configured with at least one of the following:
- VMware ESX Server 4.1, Update 2
- VMware ESXi 5.0, any Update
- VMware vSphere Hypervisor 5.0 with IBM Customization Installable, any Update
This tip is not option specific.
The following system firmware level(s) are affected:
- IMM2 1AOO27Q
- IMM2 1AOO28Q
- IMM2 1AOO27S
- IMM2 1AOO28S
The system has the symptom described above.
Note: This does not imply that the network
operating system will work under all combinations of hardware and
software.
Please see the compatibility page for more information:
http://www.ibm.com/systems/info/x86servers/serverproven/compat/us/
Solution
At first failure, collect the CMM service data in addition to the compute node service data before updating firmware.
Ensure FP2 is installed (see the firmware levels below). If the IMM2 Ethernet over Universal Serial Bus (USB) interface was disabled previously, re-enable the interface before installing the firmware. Follow these steps from the IMM2 web interface:
1. Select the Integrated Management Module
(IMM) Management and Network menu items.
2. Navigate to the USB tab on the Network
Properties web page.
3. Check the Enable Ethernet over USB box and
click the Apply button.
The IMM2 firmware ('ibm_fw_imm2_1aoo32p-1.60') contains the fix to eliminate these errors.
If failures continue after updating the firmware, escalate a Problem Management Report (PMR) to Product Engineering (PE).
There are online documents and help databases that tell users to replace hardware. Do not replace hardware for COMM faults. It is felt most COMM faults will be software- and not hardware-related.
|
FP2 firmware levels: IBM Unified Extensible Firmware Interface (UEFI) Flash Update: IBM_fw_uefi_b2e118b-1.10_anyos_32-64 IMM2 Update: IBM_fw_imm2_1aoo32p-1.60_anyos_noarch IBM CMM firmware: v1.20.0g 2PET10G |
Workaround
Ensure FP2 firmware is installed (see the firmware levels in the Fix section above
Additional information
An unexpected error causes VMware to spawn multiple processes when it does not receive the data it expects from IMM2. This causes degraded operating system performance on the server and also forces IMM2 to become busy handling requests from those multiple processes.
The IMM2 firmware release contains a change to throttle the
requests to prevent inaccessibility to IMM2. A future release of
the VMware pass-through provider will eliminate the spawning of
multiple processes when the unexpected error occurs.
Document Location
Worldwide
Was this topic helpful?
Document Information
Modified date:
30 January 2019
UID
ibm1MIGR-5090620