BMYUP2106

Firmware update Job failed for the compute node.

Error conditions

The error message can be for the following conditions:

  • Failed to put the node in maintenance mode for firmware update.
  • Failed to move the node out of maintenance after firmware update attempt.
  • Power operation of graceful restart of the node failed after firmware update attempt.
  • Firmware update failed on the node, not reflecting the latest version after the node reboot and IMM reset.
  • Firmware update operation timed out on the compute node.
  • Scale cluster did not turn healthy post firmware update on the node.

Severity

Critical

User action

Do the following steps to diagnose and report the problem:
  1. Check whether any other node is in maintenance mode. Make sure that all the nodes must be up and in a ready state to perform firmware update operations.
  2. An administrator needs to move the node out of maintenance mode by using the OpenShift® or IBM Storage Fusion user interface.
  3. The administrator should power-off and power-on the node, and after it gets completed successfully, move the node out of maintenance mode manually. This issue might get resolved on its own. This issue happens mostly when power operation takes longer than expected.
  4. Run the OC Command to check whether the scale cluster is healthy.
    oc -n ibm-spectrum-fusion-ns get scales storagemanager -oyaml | grep storageClusterStatus
  5. Retrigger firmware update if compute CR shows upgradeRequired=true. If the firmware update fails again, contact IBM Support.
  6. The administrator should collect the compute logs and then retrigger the firmware update on the node after ensuring all the prerequisites are met. If the retrigger fails again, contact IBM Support.
  7. The firmware update timed out. The administrator should collect the compute logs and retrigger the firmware update after some time. If the firmware update continues to fail, contact the IBM Support.
  8. Collect the compute logs. Perform a power-off and power-on of the target node. See Administering the node.
  9. After successfully powering on the node, wait 15 minutes to see whether the scale cluster state changed to healthy. If the scale cluster remains in a DEGRADED state, contact the IBM Support.