Dell node issues

This topic provides a comprehensive list of common troubleshooting steps and known issues that may arise while working with the Dell nodes.

Unable to add node to OpenShift Container Platform

Problem statement
The node cannot be added to the OpenShift® Container Platform cluster.
Cause
The issue might be caused by a hung power operation (powerop) for the node. This can be identified by checking the logs on the system.
Symptoms
  • The power operation is not progressing.
  • The live powerop CR logs for the node are blank or do not show any activity.
  • The powerop CR is not getting updated.
Resolution
To resolve this issue, follow these steps:
  1. Check the live logs on the system for the powerop operation.
  2. If the logs are blank or not updating, delete the powerop CR by running the following commands in the ibm-spectrum-fusion-ns namespace:
    oc get cpr
    oc delete cpr <node-powerop-cr-name>
    Important: Wait for the command to delete the powerop CR. Avoid forceful deletion as it may cause further issues.
  3. After deleting the powerop CR, verify if the node can be added to the OpenShift Container Platform cluster.
  4. If the issue persists, contact IBM support .

Unable to add GPU node to compute cluster

Problem statement
Unable to add a GPU node to the compute cluster due to a failure in importing the system configuration.
Symptoms
When attempting to add a Dell GPU DG4 node to the compute cluster, the following issues occur:
  • Dell iDRAC issue (LC068) causes the GPU node configuration to fail in IBM Fusion compute configuration CR.
  • Failure to set the network boot order while adding the node to the OpenShift Container Platform due to a Redfish call error. The error message indicates: errorCode: SYS011, message: Pending configuration values are already committed, unable to perform another set operation.
Cause
The cause of the issue is attributed to a Dell iDRAC issue (LC068) and a Redfish call failure to set PXE settings.
Resolution
To resolve this issue, contact IBM support.

Dell nodes job queue creation fails

Problem statement
Dell nodes job queue creation fails due to internal memory overload.
Cause
Due to a bug in the LCLogAggregation function in current versions of iDRAC, an issue has arisen with the management of event logs within the internal Lifecycle Controller (LCC) partition. This malfunction causes iDRAC to encounter an internal error, which prevents the creation of configuration jobs. As a result, node configuration changes cannot be applied because the internal memory lacks sufficient space to accommodate new tasks. An following is an example error that is seen from the terminal when you try to create the job:
racadm>>jobqueue create BIOS.Setup.1-1 -s TIME_NOW -r pwrcycle
ERROR: JCP012: The operation failed due to an internal iDRAC error.
Retry the operation. If the issue persists, reset iDRAC and retry the operation.
racadm>>
Resolution
  1. Get the ini file to send to Dell support team:
    racadm -r <iDRAC ip> -u <iDRAC user> -p <iDRAC password> debug unblock rootshellash <date(yyyy-mm-dd)> <date next month(yyyy-mm-dd)> -f <service tag>.ini
    For example:
    racadm -r 169.253.1.3 -u root -p calvin debug unblock rootshellash 2025-08-20 2025-09-20 -f JW47NW3.ini
  2. With the new ini file from Dell support team enable "debug grant":
    racadm -r <iDRAC IP> -u <iDRAC user> -p <iDRAC password> debug grant -f <service tag>.ini
    For example:
    racadm -r <iDRAC IP> -u root -p calvin debug grant -f Case_#214177489_-_4X47NW3.ini
  3. Ssh to iDRAC:
    ssh <iDRAC IP>
  4. Open debug mode:
    debug invoke rootshellash
  5. Change to root:
    /tmp $ su -
  6. Extract the event.log:
    /home/root# scp /flash/data1/LCL/event.log root@1<iDRAC IP>:/tmp
  7. Remove event-log:
    /home/root# rm /flash/data1/LCL/event.log
  8. Disable LCLogAggregation:
    set iDRAC.Logging.LCLogAggregation 0
  9. Reset iDRAC:
    racreset
  10. Wait for two to five minutes and check if the jobqueue command is working.

XLR660 iDRAC dedicated network port failover limitation

Problem statement
The XLR660 includes a dedicated iDRAC (Integrated Dell Remote Access Controller) network port to provide out-of-band management capabilities and this dedicated iDRAC port does not support network failover. If the iDRAC port loses connectivity, the system does not automatically reroute management traffic through other available network interfaces.
Service impact
In such failover scenarios, the IBM Fusion user interface may display a Error Connecting to Node message on the nodes page. It impacts the hardware monitoring and firmware monitoring capabilities, which retrieve the system information, adapter information, storage information, firmware status and power operation of IBM Fusion get fail.
Resolution
If you see this issue, contact IBM support.