Dell node issues
This topic provides a comprehensive list of common troubleshooting steps and known issues that may arise while working with the Dell nodes.
Unable to add node to OpenShift Container Platform
- Problem statement
- The node cannot be added to the OpenShift® Container Platform cluster.
- Cause
- The issue might be caused by a hung power operation (powerop) for the node. This can be identified by checking the logs on the system.
- Symptoms
-
- The power operation is not progressing.
- The live powerop CR logs for the node are blank or do not show any activity.
- The powerop CR is not getting updated.
- Resolution
- To resolve this issue, follow these steps:
- Check the live logs on the system for the powerop operation.
- If the logs are blank or not updating, delete the powerop CR by running the following commands
in the
ibm-spectrum-fusion-nsnamespace:oc get cproc delete cpr <node-powerop-cr-name>Important: Wait for the command to delete the powerop CR. Avoid forceful deletion as it may cause further issues. - After deleting the powerop CR, verify if the node can be added to the OpenShift Container Platform cluster.
- If the issue persists, contact IBM support .
Unable to add GPU node to compute cluster
- Problem statement
- Unable to add a GPU node to the compute cluster due to a failure in importing the system configuration.
- Symptoms
- When attempting to add a Dell GPU DG4 node to the compute cluster, the following issues occur:
- Dell iDRAC issue (LC068) causes the GPU node configuration to fail in IBM Fusion compute configuration CR.
- Failure to set the network boot order while adding the node to the OpenShift Container Platform due to a Redfish call error. The error
message indicates:
errorCode: SYS011, message: Pending configuration values are already committed, unable to perform another set operation.
- Cause
- The cause of the issue is attributed to a Dell iDRAC issue (LC068) and a Redfish call failure to set PXE settings.
- Resolution
- To resolve this issue, contact IBM support.
Dell nodes job queue creation fails
- Problem statement
- Dell nodes job queue creation fails due to internal memory overload.
- Cause
- Due to a bug in the
LCLogAggregationfunction in current versions ofiDRAC, an issue has arisen with the management of event logs within the internal Lifecycle Controller (LCC) partition. This malfunction causes iDRAC to encounter an internal error, which prevents the creation of configuration jobs. As a result, node configuration changes cannot be applied because the internal memory lacks sufficient space to accommodate new tasks. An following is an example error that is seen from the terminal when you try to create the job:racadm>>jobqueue create BIOS.Setup.1-1 -s TIME_NOW -r pwrcycle ERROR: JCP012: The operation failed due to an internal iDRAC error. Retry the operation. If the issue persists, reset iDRAC and retry the operation. racadm>>
- Resolution
-
- Get the
inifile to send to Dell support team:
For example:racadm -r <iDRAC ip> -u <iDRAC user> -p <iDRAC password> debug unblock rootshellash <date(yyyy-mm-dd)> <date next month(yyyy-mm-dd)> -f <service tag>.iniracadm -r 169.253.1.3 -u root -p calvin debug unblock rootshellash 2025-08-20 2025-09-20 -f JW47NW3.ini - With the new
inifile from Dell support team enable "debug grant":For example:racadm -r <iDRAC IP> -u <iDRAC user> -p <iDRAC password> debug grant -f <service tag>.iniracadm -r <iDRAC IP> -u root -p calvin debug grant -f Case_#214177489_-_4X47NW3.ini - Ssh to iDRAC:
ssh <iDRAC IP> - Open debug mode:
debug invoke rootshellash - Change to root:
/tmp $ su - - Extract the event.log:
/home/root# scp /flash/data1/LCL/event.log root@1<iDRAC IP>:/tmp - Remove
event-log:
/home/root# rm /flash/data1/LCL/event.log - Disable LCLogAggregation:
set iDRAC.Logging.LCLogAggregation 0 - Reset iDRAC:
racreset - Wait for two to five minutes and check if the jobqueue command is working.
- Get the
XLR660 iDRAC dedicated network port failover limitation
- Problem statement
- The XLR660 includes a dedicated iDRAC (Integrated Dell Remote Access Controller) network port to provide out-of-band management capabilities and this dedicated iDRAC port does not support network failover. If the iDRAC port loses connectivity, the system does not automatically reroute management traffic through other available network interfaces.
- Service impact
- In such failover scenarios, the IBM Fusion
user interface may display a
Error Connecting to Nodemessage on the nodes page. It impacts the hardware monitoring and firmware monitoring capabilities, which retrieve the system information, adapter information, storage information, firmware status and power operation of IBM Fusion get fail.
- Resolution
- If you see this issue, contact IBM support.