BMYUP2106

Firmware update Job failed for the compute node.

Error conditions

The error message can be for the following conditions:

  • Failed to put the node in maintenance mode for firmware update.
  • Failed to move the node out of maintenance after firmware update attempt.
  • Power operation of graceful restart of the node failed after firmware update attempt.
  • Firmware update failed on the node, not reflecting the latest version after the node restart and IMM reset.
  • Firmware update operation timed out on the compute node.
  • Scale cluster did not turn healthy post firmware update on the node.
    Note: If the storage is configured with Data Foundation, then also the message shows that Scale cluster did not turn healthy post firmware update on the node in the IBM Fusion user interface because of a typographical error in the event message.

Severity

Critical

User action

Do the following steps to diagnose and report the problem:
  1. Check whether any other node is in maintenance mode. Make sure that all the nodes must be up and in a ready state to perform firmware update operations.
  2. An administrator needs to move the node out of maintenance mode by using the OpenShift® or IBM Fusion user interface.
  3. The administrator should power-off and power-on the node, and after it gets completed successfully, move the node out of maintenance mode manually. This issue might get resolved on its own. This issue happens mostly when power operation takes longer than expected.
  4. Run the OC Command to check whether the scale cluster is healthy.
    oc -n ibm-spectrum-fusion-ns get scales storagemanager -oyaml | grep storageClusterStatus
  5. Run the OC Command to check whether the Data Foundation cluster is healthy.
    oc get odfcluster odfcluster -o yaml -n ibm-spectrum-fusion-ns | grep odfcluster.cephClusterHealth
  6. Retrigger firmware update if compute CR shows upgradeRequired=true. If the firmware update fails again, contact IBM Support.
  7. The administrator should collect the Nodes logs and then retrigger the firmware update on the node after ensuring all the prerequisites are met. If the retrigger fails again, contact IBM Support.
  8. The firmware update timed out. The administrator should collect the Nodes logs and retrigger the firmware update after some time. If the firmware update continues to fail, contact the IBM Support.
  9. Collect the Nodes logs. Perform a power-off and power-on of the target node. See Administering the node.
  10. After successfully powering on the node, wait 15 minutes to see whether the scale cluster state changed to healthy. If the scale cluster remains in a DEGRADED state, contact the IBM Support.