IBM Support

Node status incorrect in Platform HPC GUI

Troubleshooting


Problem

Nodes show booting or powering off but they should show booted.

Symptom

Cause

Possible corrupted data in pcm_node_status table

Diagnosing The Problem

Check the status of the node in xcat
lsdef <computenode> |grep status
For example:
lsdef compute23 |grep status
    status=booted

Resolving The Problem

                                                                      
On the management node (head node) as root user:                        
1. Take a backup of all xCAT tables:                                    
# dumpxCATdb -V -p /root/xcatdb.`date +%y%m%d`                          
2. Edit the nodelist table using following command:                     
#tabedit nodelist                                                       
in the edit node, pick a node and change the status to booted           
for eg: for node "gpu20"                                                
"gpu20","__Managed,__NetworkProfile_compute_network_profile,__ImageProfile_gpu-rhels6.4-x86_64-stateful-compute,__HardwareProfile_IBM_iDataPlex_M4,compu
te,gpu","booted","07-21-2015 08:20:49",,,,,"out-of-sync","07-31-2015 11:01:11"  

                                                                        
Save and exit ( just like vim use :w to save and :q to quit)

3. Verify that the status is changed in xcat tables and   
not changed in pcm tables:                                              
#tabdump nodelist |grep gpu20
#psql -U xcatadm xcatdb -c "select * from nodelist;" 
The output should say booted.
The following query may still show "booting".
#psql -U xcatadm xcatdb -c "select * from pcm_node_status;"             
                                                                        
4. Restart the nodestatus loader                                        
#plcclient.sh -d pcmnodestatusloader                                    
5. Update node status:                                                  
#updatenode gpu20 -s                                                    
6. Check xcat definition of the node:                                   
#lsdef -t gpu20                                                          
7. Open platform cluster manager web-gui and verify the status of the node after hitting "refresh".                                           

The node status should show booted. Now try re-provision the node.
                                                                        
If the status of node (gpu20) is not changed do the following:          
a. Logon to xcatdb                                                      
#psql -U xcatadm -d xcatdb
b. Update the node status:                                              
select * from pcm_node_status where node='gpu20';                       
update pcm_node_status set status = 'booted' where status='booting' and node=gpu20';
select * from pcm_node_status where node='gpu20';
                                                                        
Check the PCM web gui for status change.                       
Repeat step 5 (above)                     
Now reprovision the node to see if the node status changes to defined->installing -> booting->booted.

[{"Product":{"code":"SSENRW","label":"Platform HPC for System x"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Component":"Compute node","Platform":[{"code":"PF016","label":"Linux"}],"Version":"4.1.1;4.2","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}},{"Product":{"code":"SSDV85","label":"Platform Cluster Manager"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Component":"Dashboard","Platform":[{"code":"PF016","label":"Linux"}],"Version":"4.1.1;4.2","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Document Information

Modified date:
03 September 2018

UID

isg3T1022624