Topic
  • 3 replies
  • Latest Post - ‏2010-02-01T06:21:29Z by SystemAdmin
bejohn2
bejohn2
2 Posts

Pinned topic dsh problems after csm upgrade to 1.7.1.4

‏2010-01-15T17:46:02Z |
I am having a problem with our CSM management server following an upgrade from CSM 1.7.0
to 1.7.1.4.

I upgraded our CSM management server to 1.7.1.4 and afterwards, dsh no longer works, and
therefore none of the csm commands are working (updatenode, cfmupdatenode for example).

Our setup:
We have /usr/bin/ssh set as our remote shell for under csmconfig.
Our management server is running RedHat ES 4 update 4 (32bit)
Client nodes run RHEL4u4 (32-bit), RHEL5.2(64-bit) and RHEL5.4(64-bit) with most
running RHEL5.2.

The problems seems to be limited to nodes running RHEL4.4, which would include our CSM
management server.

I have nodes running RHEL 5.2 (64bit), and after manually upgrading the all the csm and rsct
packages on the client node (running 5.2), I am able to dsh commands from that client to the
management server and also to other clients without problem.
ex: on the upgraded client (RHEL 5.2 64-bit), I run the following command:
/opt/csm/bin/dsh -r /usr/bin/ssh -n <node> date
and the date command is executed.

on the management server, the same command just hangs, with no errors. If I Ctrl-C the command,
I get the date output along with the error that remote shell had an exit code of 255. If I
give a command that has no STDOUT, it seems works just fine

I was able to recreate the problem on one of our client nodes running RHEL4.4.
The node had the CSM 1.7.0 installed, and the above command worked just fine
to any other node (upgraded or not).
Next, I updated the csm packages to csm1.7.1.4, and the node is no longer able to execute
the dsh command. I even upgraded the openssh packages to no avail.

This seems to be a problem with the RHEL4.4 support and ssh. I could solve part of the problem
by reloading the management server with RHEL5.2, but I want to do this only as a last resort.

I would greatly appreciate any help on this problem.

Thanks,
Brian Johnston
Updated on 2010-02-01T06:21:29Z at 2010-02-01T06:21:29Z by SystemAdmin
  • bejohn2
    bejohn2
    2 Posts

    Re: dsh problems after csm upgrade to 1.7.1.4

    ‏2010-01-15T18:35:29Z  
    I uninstalled csm.dsh-1.7.1.4-27 on the client node and reinstalled csm.dsh-1.7.0.15-21, and the client is once again able to dsh commands to other nodes.

    I did the same on the management server, along with csm.server-1.7.1.4-27 to csm.server-1.7.0.15-21, and it now seems to be functioning properly again.
  • SystemAdmin
    SystemAdmin
    476 Posts

    Re: dsh problems after csm upgrade to 1.7.1.4

    ‏2010-01-30T18:02:30Z  
    I'm getting a similar problem - running RHEL4.8 on the management server, 5.4 on most of the clients, and CSM 1.7.1.5 on the entire cluster. It seems like if anything goes to stderr, then dsh hangs. If I send a command to multiple nodes, all it takes is some stderr output on one of them, and then all the responses get blocked until I Ctrl-C.

    At the same time this started happening, I also started seeing rsync hangs when doing a cfmupdatenode command. It doesn't always hang but still enough to be an annoyance. We have a cron job that does a cfmupdatenode command every hour to all 400+ nodes in the cluster. After a day, I'll often see several hundred stuck rsync processes in the management server.

    I don't have the option of backing off to an earlier version of csm, as earlier versions don't support RHEL5.4. I'm hoping that upgrading the management server to RHEL5.4 will help but it will be a while before i can schedule that.
  • SystemAdmin
    SystemAdmin
    476 Posts

    Re: dsh problems after csm upgrade to 1.7.1.4

    ‏2010-02-01T06:21:29Z  
    I'm getting a similar problem - running RHEL4.8 on the management server, 5.4 on most of the clients, and CSM 1.7.1.5 on the entire cluster. It seems like if anything goes to stderr, then dsh hangs. If I send a command to multiple nodes, all it takes is some stderr output on one of them, and then all the responses get blocked until I Ctrl-C.

    At the same time this started happening, I also started seeing rsync hangs when doing a cfmupdatenode command. It doesn't always hang but still enough to be an annoyance. We have a cron job that does a cfmupdatenode command every hour to all 400+ nodes in the cluster. After a day, I'll often see several hundred stuck rsync processes in the management server.

    I don't have the option of backing off to an earlier version of csm, as earlier versions don't support RHEL5.4. I'm hoping that upgrading the management server to RHEL5.4 will help but it will be a while before i can schedule that.
    Could you try to re-install CSM? I have tried to use a RHEL4.8 server to manage RHEL5.4 nodes and it just worked fine.