Topic
  • 6 replies
  • Latest Post - ‏2013-05-16T11:01:33Z by David.Rebatto
SystemAdmin
SystemAdmin
2092 Posts

Pinned topic CNFS and RHEL6

‏2012-12-12T16:52:38Z |
We have a eight server node GPFS cluster all connected to a DDN SAN via dual FC links. Until recently all nodes were running RHEL5 and gpfs 3.3.0.5. I decided to try upgrading one of the nodes to RHEL6 and gpfs 3.3.0.27. Everything worked fine except for CNFS. When a client tries to mount from the node, it gets a "Permission Denied" error and on the RHEL6 server the message log has

Dec 12 11:30:59 gpfs04 rpc.mountd29282: authenticated mount request from xxx.xxx.xxx.xxx:755 for /usr/share (/usr/share)
Dec 12 11:30:59 gpfs04 rpc.mountd29282: internal: no supported addresses in nfs_client
Dec 12 11:30:59 gpfs04 rpc.mountd29282: getfh failed: Operation not permitted

In /etc/exports I have the following line for testing

/usr/share *(ro,async,fsid=120001)

If I shutdown gpfs on the node and just run the RHEL6 native nfs everything works fine. So this is definitely an issue with CNFS

I have put in a PMR with IBM and they required me to upgrade all the other nodes still running RHEL5 to 3.3.0.27 and the latest RHEL5 updates. I did but the problem persists. Now they are trying to blame the RedHat NFS package which I think is crazy since it only fails under CNFS.

Anyone got CNFS working on RHEL6?
Updated on 2012-12-17T20:32:13Z at 2012-12-17T20:32:13Z by SystemAdmin
  • SchelePierre
    SchelePierre
    20 Posts

    Re: CNFS and RHEL6

    ‏2012-12-13T20:42:37Z  
    Hi,
    I've seen the same problem on RHEL 6.3 (latest updates) x86_64, with GPFS 3.4.0-17

    There's someone else with similar problems:
    http://www.ibm.com/developerworks/forums/thread.jspa?messageID=14882171
    I can confirm that, after enabling cNFS or after booting the server,
    "service nfs restart" works in the sense that it makes nfs mounts work again.
    The referenced post mentions "chkconfig nfs on" as an alternative

    I have not yet tested whether IP and NFS failover actually work as one would expect, and so I'm unsure the result is actually a functional clustered NFS.
    Pieter
  • SystemAdmin
    SystemAdmin
    2092 Posts

    Re: CNFS and RHEL6

    ‏2012-12-13T22:18:35Z  
    The ps command showed rpc.mountd running as "/usr/sbin/rpc.mountd -p 597" so CNFS is just using the stock RH6 rpc.mountd. On a lark, I killed the process and then started my own with

    /usr/sbin/rpc.mountd -p 597 -F

    and then suddenly everything worked. Since the symlinks put in /var/lib/nfs are still there I am assuming all the clustering/failover stuff will still work.

    So my guess at this point is there something about the environment in mmstartup that rpc.mountd gets runs from that causes the problem. However I could not find anything obvious in the processes environment vars or limits.
  • SchelePierre
    SchelePierre
    20 Posts

    Re: CNFS and RHEL6

    ‏2012-12-14T08:33:23Z  
    Following,
    I can confirm the links under /var/lib/nfs (v4recovery, statd/sm, statd/sm.bak) are still there and pointing to the GPFS shared filesystem - I am about to test this now (whether chkconfig nfs on actually allows NFS client failover to work)

    I also found out this:
    http://comments.gmane.org/gmane.linux.nfs/41432

    It is about the source code line that triggers the "internal: no supported addresses in nfs_client" in the NFS server's /var/log/messages (on my RHEL/CentOS 6.3)
    The relevant code is in file support/export/nfsctl.c from the nfs-utils source RPM.
    There, in function "cltsetup", the offending (?) code is

    for (i = 0; i < cltarg->cl_naddr && i < NFSCLNT_ADDRMAX; i++) {
    const struct sockaddr_in *sin = get_addrlist_in(clp, i);
    if (sin->sin_family == AF_INET)
    cltarg->cl_addrlistj++
    ...
    The condition in the for loop should be changed (according to the link I mentioned) to

    for (i = 0; i < clp->m_naddr && i < NFSCLNT_ADDRMAX; i++) {
    After rebuilding the nfs-utils RPM with this file patched, and copying over rpc.*d to /usr/sbin, I can NFS mount after enabling cNFS support - no "chkconfig nfs on" or "service nfs restart" needed.

    As in the first scenario, I have not yet performed client failover tests to see if the result actually is a fully functional NFS failover cluster - note that I do not understand the entire impact of the change mentioned, I merely applied it.
    Pieter
  • SystemAdmin
    SystemAdmin
    2092 Posts

    Re: CNFS and RHEL6

    ‏2012-12-14T14:36:39Z  
    Following,
    I can confirm the links under /var/lib/nfs (v4recovery, statd/sm, statd/sm.bak) are still there and pointing to the GPFS shared filesystem - I am about to test this now (whether chkconfig nfs on actually allows NFS client failover to work)

    I also found out this:
    http://comments.gmane.org/gmane.linux.nfs/41432

    It is about the source code line that triggers the "internal: no supported addresses in nfs_client" in the NFS server's /var/log/messages (on my RHEL/CentOS 6.3)
    The relevant code is in file support/export/nfsctl.c from the nfs-utils source RPM.
    There, in function "cltsetup", the offending (?) code is

    for (i = 0; i < cltarg->cl_naddr && i < NFSCLNT_ADDRMAX; i++) {
    const struct sockaddr_in *sin = get_addrlist_in(clp, i);
    if (sin->sin_family == AF_INET)
    cltarg->cl_addrlistj++
    ...
    The condition in the for loop should be changed (according to the link I mentioned) to

    for (i = 0; i < clp->m_naddr && i < NFSCLNT_ADDRMAX; i++) {
    After rebuilding the nfs-utils RPM with this file patched, and copying over rpc.*d to /usr/sbin, I can NFS mount after enabling cNFS support - no "chkconfig nfs on" or "service nfs restart" needed.

    As in the first scenario, I have not yet performed client failover tests to see if the result actually is a fully functional NFS failover cluster - note that I do not understand the entire impact of the change mentioned, I merely applied it.
    Pieter
    Excellent! Thank you.

    Applying that patch to nfs-utils SRC rpm and rebuilding did fix the issue.

    So in the end a bug in the stock RH nfs-utils was the issue.
    Why the bug is only triggered when running under CNFS is a total mystery through.
  • SystemAdmin
    SystemAdmin
    2092 Posts

    Re: CNFS and RHEL6

    ‏2012-12-17T20:32:13Z  
    Excellent! Thank you.

    Applying that patch to nfs-utils SRC rpm and rebuilding did fix the issue.

    So in the end a bug in the stock RH nfs-utils was the issue.
    Why the bug is only triggered when running under CNFS is a total mystery through.
    Well, too soon to declare victory. Failover is not working.

    If I bring GPFS (and thus CNFS too) down on the RHEL6 node, the virtual IP
    fails over to one of the RHEL5 nodes and clients that had NFS mounts through
    that virtual IP still have working mounts. However, if I then bring back up
    GPFS and CNFS on the RHEL6 node which then takes back the virtual IP, then
    those mounts no longer work. They either report "Stale NFS mount" or
    "Permission Denied" when you try to access them from the client. On the RHEL6
    server, if I run 'tcpdump port nfs' I see lines with "Auth Bogus Credentials
    (seal broken)" from those clients getting "Permission Denied". If I bring
    GPFS down again so the virtual IP fails over to a RHEL5 node, the mounts
    on the clients start working again.
  • David.Rebatto
    David.Rebatto
    8 Posts

    Re: CNFS and RHEL6

    ‏2013-05-16T11:01:33Z  

    Does anybody know if this problem has been fixed in some subsequent GPFS release?

     

    Thanks,

    David