The following article explains how to remove a worker node that is “down” or otherwise unreachable, from a BigSQL cluster.Normally, a Big SQL worker node can be decommissioned from cluster if the host is alive :
Figure 1: Showing the decommission option of a worker node.
However, when a host can no longer be accessed, above option will not be possible as shown below;
Figure 2 : The host worker2Host is dead. The service states shows “unknown” and in the Ambari UI , the host heartbeats are lost.
How to remove a dead node in Big SQL Cluster from command line.
There are 3 phases of the Big SQL Worker Cleanup :
Phase1- Deregistering host from Big SQL (whether targeted Big SQL Worker is accessible or not )
– Removal from Big SQL db
– Removal from Big SQL cluster
Phase2- Removal of Big SQL binaries/packages from worker host (if targeted Big SQL Worker is accessible)
Phase3- Removal of Big SQL worker service entry from Ambari(whether targeted Big SQL Worker is accessible or not)
The 3 phases above will be performed by the utility fullBigSqlCleanup.sh with -w option.
WARNING : Running this script without -w option will WIPE the whole Big SQL cluster
Here is how :
Step 1 : ssh to Big SQL Head node as sudo user
Step 2 : Switch to bigsql user and determine the dead node’s node number in the Big SQL Cluster:
# su - bigsql $ cat ~/sqllib/db2nodes.cfg |grep workerHost2 2 worker2Host 0
The host is in question has Big SQL node number “2”.
It is the first field separated by space in db2nodes.cfg that corresponds worker2Host
Step 3 : Validate the host nodenumber one more time by attempting to start and/or stop Big SQL service from command line
Figure 4: Failure to ping and to start/stop Big SQL Worker worker2Host
Now user has validated the node number 2 is dead. It is completely out of the network, and user does not have intention to bring it back ever again.
Step 4 : As root on Big SQL Head node
$ su - bigsql $ cat ~/sqllib/db2nodes.cfg0 head1.mydomain.mycompany.com 0 1 worker1.mydomain.mycompany.com 0 2 worker2.mydomain.mycompany.com 0 >> This node will be removed
Switch to root on Big SQL Head node :
[root@head1 ~]# cd /var/lib/ambari-agent [root@head1 ambari-agent]# find . -name fullBigSqlCleanup.sh ./cache/stacks/HDP/2.4/services/BIGSQL/package/scripts/fullBigSqlCleanup.sh [root@head1 ambari-agent]# cd ./cache/stacks/HDP/2.4/services/BIGSQL/package/scripts
Let’s run the command to get usage help :
root@headHost1 scripts]# ./fullBigSqlCleanup.shUsage: ./fullBigSqlCleanup.sh -u -p [-s ] [-w ] Required parameters: -u: Ambari admin username -p: Ambari admin password Worker node cleanup: -w: Worker node hostname Using this option will remove the specified worker from the existing Big SQL cluster. Optional: -Z: sudo_ssh_user specify the sudo/ssh user if it is other than root. WARNING: THIS SCRIPT SHOULD BE INVOKED FROM BIGSQL_HEAD_NODE
Here is the command to cleanup worker2.mydomain.mycompany.com from bigsql cluster:
[root@head1 scripts]# ./fullBigSqlCleanup.sh -u admin -p admin -w worker2.mydomain.mycompany.com
Output of the command will look like below, waiting user input for confirmation, enter “Y” to continue to remove:
Single host cleanup is requested on worker2.mydomain.mycompany.com SSL is NOT set Please confirm the following cluster info: Ambari server = worker1.mydomain.mycompany.com Ambari port = 8081 Ambari cluster = TESTHDP24 Would you like to continue? (Y/n): Y Exporting environment variables for bigsql service Successfully exported environment variables Cleanup parameters: BIGSQL_USER = bigsql DATA_DIRS = /var/ibm/bigsql/database,/hadoop/bigsql AMBARI_SERVER = worker1.mydomain.mycompany.com AMBARI_CLUSTER = TESTHDP24 AMBARI_PORT = 8081 AMBARI_USER = admin BIGSQL_USER_HOME = /home/bigsql TARGET_HOSTLIST = /tmp/bigsqlSSHHostList ************************** Existing Big SQL host list: ************************** worker1.mydomain.mycompany.com worker2.mydomain.mycompany.com head1.mydomain.mycompany.com head2.mydomain.mycompany.com ************************** 2 worker2.mydomain.mycompany.com 0 Target host: worker2.mydomain.mycompany.com Target nodes: 2 Current host: head1.mydomain.mycompany.com Big SQL Head Host: head1.mydomain.mycompany.com Processing worker removal for node 2 (forced: 0) Given node array to remove 0 head1.mydomain.mycompany.com 0 1 worker1.mydomain.mycompany.com 0 2 worker2.mydomain.mycompany.com 0 Processing removal of 2, worker2.mydomain.mycompany.com, 0 Log file of this shell is: /tmp/bigsql/logs/bigsql-fixtopology-2016-10-15_03.56.23.3491.log ... ... Timed out in first attempt. Retrying ambari-server restart Using python /usr/bin/python Restarting ambari-server Using python /usr/bin/python Stopping ambari-server Ambari Server stopped Using python /usr/bin/python Starting ambari-server Ambari Server running with administrator privileges. Organizing resource files at /var/lib/ambari-server/resources... Server PID at: /var/run/ambari-server/ambari-server.pid Server out at: /var/log/ambari-server/ambari-server.out Server log at: /var/log/ambari-server/ambari-server.log Waiting for server start.................... Ambari Server 'start' completed successfully.
Now Let’s validate the outcome:
[root@head1 scripts]# su - bigsql [bigsql@head1 sqllib]$ cat ~/sqllib/db2nodes.cfg 0 head1.mydomain.mycompany.com 0 1 worker1.mydomain.mycompany.com 0