Troubleshooting
Problem
A rare timing event during boot of the 7063-CR2 HMC can result in different conditions, depending on the version of BMC FW currently functional on the HMC.
For HMCs with PNOR OP9-v2.5-4.123 / BMC op940.hmc-11.1 or older:
The timing event results in a side switch of the active side of the BMC. The previously inactive side (now active), will contain stale BMC settings information, such as network settings, and passwords, which lead to loss of connectivity to the BMC. On the OS, the ipmi0 device is missing. This impacts the ipmitool command and any commands that rely on it.
For HMCs with PNOR OP9-v2.5-4.124 / BMC op940.hmc-16 and newer:
The timing event results in a restart of the BMC. On the OS, the ipmi0 device is missing. This impacts the ipmitool command and any commands that rely on it.
Symptom
For HMCs with PNOR OP9-v2.5-4.123 / BMC op940.hmc-11.1 or older
Loss of remote connectivity to the BMC:
The BMC does not respond when attempting to access it through ssh or browser, due to incorrect or missing network settings as a result of the active BMC side switch. If the BMC is reconfigured, by using ipmitool from Petitboot, it is likely that access will still be impacted because password changes are also lost in the side switch.
On the HMC OS, the ipmi0 device is missing, resulting in the inability to run the ipmitool command, inband. Other commands that rely on ipmitool, like lshmc --bmc will also be impacted, and fail to produce the expected results.
Example: For a normal user:
lshmc --bmc
No results found.
No results found.
When running pedbg on 7063CR2, some information about the BMC will not be collected, since the lshmc --bmc call returns no data.
Example: For root user (provided by IBM Support only):
ipmitool lan print
Could not open device at /dev/ipmi0 or /dev/ipmi/0 or /dev/ipmidev/0: No such file or directory
Could not open device at /dev/ipmi0 or /dev/ipmi/0 or /dev/ipmidev/0: No such file or directory
For HMCs with PNOR OP9-v2.5-4.124 / BMC op940.hmc-16 and newer:
On the HMC OS, the ipmi0 device is missing, resulting in the inability to run the ipmitool command, inband. Other commands that rely on ipmitool, like lshmc --bmc will also be impacted, and fail to produce the expected results.
Example: For a normal user:
lshmc --bmc
No results found.
No results found.
When running pedbg on 7063CR2, some information about BMC will not be collected, since the lshmc --bmc call returns no data.
Example: For root user (provided by IBM Support only):
ipmitool lan print
Could not open device at /dev/ipmi0 or /dev/ipmi/0 or /dev/ipmidev/0: No such file or directory
Could not open device at /dev/ipmi0 or /dev/ipmi/0 or /dev/ipmidev/0: No such file or directory
Cause
The problem will not occur on every reboot and has been observed with more frequency on HMCs running v10r1.1011 (latest at the time of this writing).
Environment
All HMC 7063-CR2 HMCs.
Diagnosing The Problem
The simplest way to verify the condition is to run the lshmc --bmc command. The command leverages the ipmitool command to obtain the network configuration of the BMC and display them.
Example:
hscroot@myhmc:~> lshmc --bmc
ipv4addr=0.0.0.0,networkmask=255.255.255.255,gateway=9.1.128.1,ipv4dhcp=on
ipv4addr=0.0.0.0,networkmask=255.255.255.255,gateway=9.1.128.1,ipv4dhcp=on
If instead, it returns no results then it is highly likely that the HMC experienced the problem.
Example:
lshmc --bmc
No results found.
No results found.
Resolving The Problem
It is strongly recommended that all 7063-CR2 HMCs be updated to the latest available firmware (PNOR and BMC) to minimize the impact of the problem.
For HMCs with PNOR OP9-v2.5-4.124 / BMC op940.hmc-16 or newer:
With PNOR OP9-v2.5-4.124 / BMC op940.hmc-16, the BMC reboots instead of switching sides which results in only a temporary loss of connectivity. The HMC should be rebooted and once available again, rerun the lshmc --bmc command to verify if the problem was resolved.
HIPER/Pervasive: A problem was fixed for error recovery for an intermittent BMC hang that causes a flash side switch on reset of the BMC with loss of profile settings (including network settings). The intermittent BMC hang can occur while the host is rebooting as part of a host firmware update. With the fix, the recovery for a BMC hang is a BMC reset that is done without switching the BMC flash side, allowing the BMC to reboot cleanly.
For HMCs with PNOR OP9-v2.5-4.123 / BMC op940.hmc-11.1 or older:
It is required to shut down the HMC and then remove power to force the BMC to restart from the original side.
Document Location
Worldwide
[{"Type":"MASTER","Line of Business":{"code":"LOB57","label":"Power"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"7063-CR2","label":"Hardware Management Console (7063-CR2)"},"ARM Category":[],"ARM Case Number":[],"Platform":[{"code":"PF025","label":"Platform Independent"}]}]
Was this topic helpful?
Document Information
Modified date:
18 November 2022
UID
ibm16520826