Topic
  • 3 replies
  • Latest Post - ‏2011-08-17T17:34:23Z by Casey_B
SystemAdmin
SystemAdmin
69 Posts

Pinned topic cldare: Unable to execute a command remotely...

‏2011-07-21T05:42:54Z |
Hi All,

Setting up an AIX HACMP 5.5 two node cluster and getting the following error message when running an "Extended Verification and Synchronization"


cldare: Unable to execute a command remotely on node CLUSTER_B, check clcomd.log file 

for more information. Any configuration changes must be propagated to the node before the node will be allowed to join the cluster.


The AIX environment that I'm working on doesn't allow RSH (or any r daemons) to run on it's nodes. I've picked through the /usr/es/sbin/cluster/utilities/cldare script which calls another called /usr/es/sbin/cluster/utilities/cl_rsh. So I'm assuming that this process is failing because there is no RSH allowed between hosts.

Am I correct to assume this? Also, with cldare not being able to execute commands on the remote host, what configuration changes need to be propagated? As far as I can tell, the cluster seems to be working fine.

Thanks,
-Kristian
Updated on 2011-08-17T17:34:23Z at 2011-08-17T17:34:23Z by Casey_B
  • Holgervk
    Holgervk
    10 Posts

    Re: cldare: Unable to execute a command remotely...

    ‏2011-08-03T21:58:40Z  
    the name rsh/cl_rsh is misleading.
    the cl_rsh command does not use rsh but clcomd-daemons (port 6191).
    if you block this on a hostbased firewall, you should open it. other parts of powerha also use clcom (file collections etc.). I would not assume Powerha to run fine with those ports blocked.

    if you dont use a hostbased-firewall, you have another problem.
    your powerha-nodes use the same service-ip so they are on the same subnet so there is no firewall.
    if this is the case, post again telling some details; f.e. output from tcpdump port clcomd when doing cl_rsh other_node id or similar
  • SystemAdmin
    SystemAdmin
    69 Posts

    Re: cldare: Unable to execute a command remotely...

    ‏2011-08-05T05:59:49Z  
    • Holgervk
    • ‏2011-08-03T21:58:40Z
    the name rsh/cl_rsh is misleading.
    the cl_rsh command does not use rsh but clcomd-daemons (port 6191).
    if you block this on a hostbased firewall, you should open it. other parts of powerha also use clcom (file collections etc.). I would not assume Powerha to run fine with those ports blocked.

    if you dont use a hostbased-firewall, you have another problem.
    your powerha-nodes use the same service-ip so they are on the same subnet so there is no firewall.
    if this is the case, post again telling some details; f.e. output from tcpdump port clcomd when doing cl_rsh other_node id or similar
    Hi Holgervk,

    Firstly, thanks for your reply.

    So this is the scenario, the cluster services are currently offline on both cluster nodes.
    
    root@clusterA# lssrc -ls clstrmgrES Current state: ST_INIT sccsid = 
    "@(#)36    1.135.5.2 src/43haes/usr/sbin/cluster/hacmprd/main.C, hacmp.pe, 53haes_r550, 0934B_hacmp550 8/8/09 14:48:23"
    


    Now at this stage, nothing is listening on port 6191 and from your post, which is why cldare is failing.

    
    root@clusterA# cldare -t   cldare: Unable to execute a command remotely on node AUAEUAP289WBCX2, check clcomd.log file 
    
    for more information. Any configuration changes must be propagated to the node before the node will be allowed to join the cluster.
    


    Now once I start cluster services on both nodes, clcomd is listening on 6191.

    
    root@clusterA# netstat -Aan | grep 6191 f100060002052398 tcp4       0      0  *.6191             *.*                LISTEN root@clusterA# rmsock f100060002052398 tcpcb The socket 0x2052008 is being held by proccess 368850 (clcomd).
    


    Now when I run cldare -t, I don't get the error.
    So is this working as designed then, because the error is a little off putting when you do a verification and synchronisation with the cluster offline.

    Thanks,
    Kristian
  • Casey_B
    Casey_B
    29 Posts

    Re: cldare: Unable to execute a command remotely...

    ‏2011-08-17T17:34:23Z  
    Hi Holgervk,

    Firstly, thanks for your reply.

    So this is the scenario, the cluster services are currently offline on both cluster nodes.
    <pre class="jive-pre"> root@clusterA# lssrc -ls clstrmgrES Current state: ST_INIT sccsid = "@(#)36 1.135.5.2 src/43haes/usr/sbin/cluster/hacmprd/main.C, hacmp.pe, 53haes_r550, 0934B_hacmp550 8/8/09 14:48:23" </pre>

    Now at this stage, nothing is listening on port 6191 and from your post, which is why cldare is failing.

    <pre class="jive-pre"> root@clusterA# cldare -t cldare: Unable to execute a command remotely on node AUAEUAP289WBCX2, check clcomd.log file for more information. Any configuration changes must be propagated to the node before the node will be allowed to join the cluster. </pre>

    Now once I start cluster services on both nodes, clcomd is listening on 6191.

    <pre class="jive-pre"> root@clusterA# netstat -Aan | grep 6191 f100060002052398 tcp4 0 0 *.6191 *.* LISTEN root@clusterA# rmsock f100060002052398 tcpcb The socket 0x2052008 is being held by proccess 368850 (clcomd). </pre>

    Now when I run cldare -t, I don't get the error.
    So is this working as designed then, because the error is a little off putting when you do a verification and synchronisation with the cluster offline.

    Thanks,
    Kristian
    clcomdES is not the same as clstrmgrES.

    clcomd is used ( as was mentioned ) as the daemon that services cl_rsh commands, and can be running when
    the cluster is down.

    It seems to me that clcomdES is usually started upon bootup of an AIX node, but I
    am not sure. ( look in /etc/inittab )

    Also, look at the following links to similar information:
    https://www-304.ibm.com/support/docview.wss?uid=isg1IZ47409
    http://publib.boulder.ibm.com/infocenter/aix/v6r1/index.jsp?topic=/com.ibm.aix.hacmp.trgd/ha_trgd_clcomdes_clstmgres.htm
    http://publib.boulder.ibm.com/infocenter/aix/v6r1/index.jsp?topic=/com.ibm.aix.hacmp.admngd/ha_admin_disable_enable_cc_daemon.htm

    You can start the clcomd daemon as mentioned in the last link, then you will be able to run synch and
    verify when the rest of the cluster is down.

    Just as a sidenote, I don't remember if cldare is an end user command.
    The general rule is that if there isn't a man page for it, then you should not be running the command.

    ( Otherwise the command is internal, and the syntax and behaviour may change without notice. )

    Hope this helps,
    Casey