Topic
IC4NOTICE: developerWorks Community will be offline May 29-30, 2015 while we upgrade to the latest version of IBM Connections. For more information, read our upgrade FAQ.
20 replies Latest Post - ‏2013-04-04T21:06:58Z by fleers
pfo
pfo
23 Posts
ACCEPTED ANSWER

Pinned topic sporadic permission denied on NFS export

‏2011-08-08T13:39:14Z |
I've got a GPFS fs that is exported via cNFS to a limited node set, i can clearly see that certain stat(3) are failing due to sporadic permission problems. I can sometimes even see that from the shell and sometimes from users which run a huge humber of jobs (ie fails with stat permission denied 20 out of 10000 times on an array job). I cannot how-ever limit this to certain nodes, it just happens "from time to time". It only occurs on the cNFS export, never on nodes/machines that have a native GPFS client. The export is coming from NSD servers though.

Here's one example of this occuring:

user@node021:/projects/foobar/test> cd foo
-bash: cd: blast_iout: Permission denied
user@node021:/projects/foobar/test> cd foo
-bash: cd: blast_out: Permission denied
user@node021:/projects/foobar/test> cd foo
-bash: cd: blast_out: Permission denied
user@node021:/projects/foobar/test> cd foo
user@node021:/projects/foobar/test/foo> ls

The directories have group setgid on them and are additionally fileset junctions which have appropriate initial file placement rules (which should be irrelevant for the cNFS export).
Updated on 2013-04-04T21:06:58Z at 2013-04-04T21:06:58Z by fleers
  • pfo
    pfo
    23 Posts
    ACCEPTED ANSWER

    Re: sporadic permission denied on NFS export

    ‏2011-08-10T14:32:16Z  in response to pfo
    anyone? I know this sounds weird but it can really observe this from time to time ...
  • SystemAdmin
    SystemAdmin
    2092 Posts
    ACCEPTED ANSWER

    Re: sporadic permission denied on NFS export

    ‏2011-08-11T01:22:16Z  in response to pfo
    please show the /etc/exports file on NSD servers,and the permission of directory /projects/foobar/test.
  • johnjohn_france
    johnjohn_france
    4 Posts
    ACCEPTED ANSWER

    Re: sporadic permission denied on NFS export

    ‏2011-08-19T13:57:57Z  in response to pfo
    Well, I am very interested in the issue of this (old)problem.
    I have a linux cluster with GPFS 3.3.
    On some nodes of this cluster, nodes used for interactive session, GPFS is used thanks to cNFS export.
    Sometimes I observe that "permission denied" occurs when I try to do an 'ls' command system in a directory which I am the owner.
    It seems to occur after a quite long times, when I do not use my session and I comme back to see if my run is ok (use of 'ls' to see if some files are created in the current directory)

    The users authentification is done with NIS domain (but nothing seems wrong with it)

    In advance, thank for your help...
    • pfo
      pfo
      23 Posts
      ACCEPTED ANSWER

      Re: sporadic permission denied on NFS export

      ‏2011-08-19T14:27:44Z  in response to johnjohn_france
      I'm going to open a case with GPFS support - i'm still experiencing this on a daily basis now.
      • gpfs@us.ibm.com
        gpfs@us.ibm.com
        97 Posts
        ACCEPTED ANSWER

        Re: sporadic permission denied on NFS export

        ‏2011-08-19T14:34:48Z  in response to pfo
        It's a good idea to open a problem with service so that they can collect data needed to diagnose this correctly, but one cause of this in the past was due to Linux kernels (2.6.27 or later) depending on capabilities for lookup permission (CAP_DAC_READ_SEARCH). If you have such a kernel, and your GPFS level is older than 3.3.0.7, then you need to upgrade to pick-up the GPFS support for this.
        • pfo
          pfo
          23 Posts
          ACCEPTED ANSWER

          Re: sporadic permission denied on NFS export

          ‏2011-08-19T15:02:04Z  in response to gpfs@us.ibm.com
          The kernel in use here is 2.6.27.19-5-default (SLES11)
          • pfo
            pfo
            23 Posts
            ACCEPTED ANSWER

            Re: sporadic permission denied on NFS export

            ‏2011-08-19T15:04:08Z  in response to pfo
            and GPFS version: 11.05 (3.3.0.2)
        • pfo
          pfo
          23 Posts
          ACCEPTED ANSWER

          Re: sporadic permission denied on NFS export

          ‏2011-08-19T15:05:07Z  in response to gpfs@us.ibm.com
          Thanks for the info, but how can this problem be sporadic rather than permanent? Could this be a load issue?
  • johnjohn_france
    johnjohn_france
    4 Posts
    ACCEPTED ANSWER

    Re: sporadic permission denied on NFS export

    ‏2011-08-19T14:51:56Z  in response to pfo
    My configuration is :
    (linux kernel) 2.6.18-194.el5
    (gpfs) 3.3.0-13

    I don't know which kind of test I can improve to have a better idea of what's wrong : I suppose that the "opendir", i.e. the "open(O_DIRECTORY)", in the 'ls' command return the bad value EACCES.
    (could be a link with the GPFS meta-data access? the I/O server dealing with these meta-data?)
    • pfo
      pfo
      23 Posts
      ACCEPTED ANSWER

      Re: sporadic permission denied on NFS export

      ‏2011-08-19T15:45:14Z  in response to johnjohn_france
      Funny, on my setup it's stat() that is failing.
      • johnjohn_france
        johnjohn_france
        4 Posts
        ACCEPTED ANSWER

        Re: sporadic permission denied on NFS export

        ‏2011-12-20T13:20:11Z  in response to pfo
        Hi,

        Does someone have any news about this GPFS permission denied problem?
        I still have this problem.
        An analyse has been done by DDN support (ticket #31465) on our configuration but nothing seems to be wrong on GPFS.
        This problem has been identify last year in another site : http://lists.ci.uchicago.edu/pipermail/pads-users/2010-December/thread.html#11, but I don't know if a solution has been found.

        I don't know how to solve this problem, I need help please...

        Best Regards
        • bhartner
          bhartner
          58 Posts
          ACCEPTED ANSWER

          Re: sporadic permission denied on NFS export

          ‏2011-12-20T16:25:11Z  in response to johnjohn_france
          A problem was fixed with:

          https://www-304.ibm.com/support/docview.wss?uid=isg1IZ75258

          If this is not your problem, open a problem report with service so that they can collect data needed to diagnose this correctly.
          • johnjohn_france
            johnjohn_france
            4 Posts
            ACCEPTED ANSWER

            Re: sporadic permission denied on NFS export

            ‏2012-01-02T16:41:37Z  in response to bhartner
            Thanks for these information and for your help.

            In fact, our GPFS 3.3.0-13 configuration uses NFS module delivered with Infiniband driver (using capabilities notion) and our I/O GPFS server have kernel Linux 2.6.18 (RHEL 5.5) : there is a mismatch here.
            We plan to upgrade our system environment (RHEL 5.7) and use its NFS module instead of the one delivered by OFED.

            With Best Regards
  • philgp
    philgp
    6 Posts
    ACCEPTED ANSWER

    Re: sporadic permission denied on NFS export

    ‏2011-12-20T16:12:49Z  in response to pfo
    No news here, but I wonder if this problem is related to the one we see.

    On AIX NFS clients (served by AIX NFS exporting GPFS, not cNFS), while trying to execute a binary or script in NFS, we'll sporadically get messages like

    -bash: ./a.out: Cannot open or remove a file containing a running program.

    -bash: /path-to-gpfs_nfs-script: /bin/sh: bad interpreter: Cannot open or remove a file containing a running program.

    /path-to-gpfs_nfs-executable[5]: /path-to-gpfs_nfs-executable:
    0403-015 Cannot access a required executable file. It is in use.

    The files in questions are always readable: one can copy them to local disk and execute them from there. Sometimes, at a later time, they may become runnable again (and then revert back to un-runnable, and repeat).

    We've been going back and forth with IBM for about 3 years on this (they're usually waiting on us, probably are right now).

    -Phil
  • DDNBryan
    DDNBryan
    5 Posts
    ACCEPTED ANSWER

    Re: sporadic permission denied on NFS export

    ‏2013-02-22T18:58:03Z  in response to pfo
    Hello pfo and IBM,

    Was there any resolution to this problem and if so, what was the resolution?

    We have a cNFS cluster with NFS clients that also report very sporadic (random) permissions denied issues and lack of correct stat() information in the output. For example:

    pentaho@lbg-admin pentaho$ ls -al
    ls: cannot access CEL: Permission denied
    ls: cannot access level3: Permission denied
    total 22944
    drwxr-s--- 7 pentaho tcgaadmin 32768 2012-08-27 10:01 .
    drwxrws--x 14 dolina tcgaadmin 32768 2013-01-04 13:20 ..
    d????????? ? ? ? ? ? CEL
    drwxr-s--- 3 pentaho tcgaadmin 32768 2012-07-27 14:53 cghub
    drwxr-s--- 2 pentaho tcgaadmin 32768 2012-04-20 08:51 downloadlogs
    d????????? ? ? ? ? ? level3
    drwxrws--- 33 pentaho tcgaadmin 32768 2012-11-25 08:01 seq_vs_snpchip
    -rw-r----- 1 pentaho tcgaadmin 23320529 2012-02-07 15:43 xml2process2.txt
    pentaho@lbg-admin pentaho$ pwd
    /datastore/tcgarepo/pentaho
    pentaho@lbg-admin pentaho$

    pentaho@lbg-admin pentaho]$ ls -al
    ls: cannot access level3: Permission denied
    total 23072
    drwxr-s--- 7 pentaho tcgaadmin 32768 2012-08-27 10:01 .
    drwxrws--x 14 dolina tcgaadmin 32768 2013-01-04 13:20 ..
    drwxr-s--- 28 pentaho tcgaadmin 131072 2012-04-20 09:21 CEL
    drwxr-s--- 3 pentaho tcgaadmin 32768 2012-07-27 14:53 cghub
    drwxr-s--- 2 pentaho tcgaadmin 32768 2012-04-20 08:51 downloadlogs
    d????????? ? ? ? ? ? level3
    drwxrws--- 33 pentaho tcgaadmin 32768 2012-11-25 08:01 seq_vs_snpchip
    -rw-r----- 1 pentaho tcgaadmin 23320529 2012-02-07 15:43 xml2process2.txt

    By creating a file, the problem is gone, at least from this directory.

    pentaho@lbg-admin pentaho]$ touch foo
    pentaho@lbg-admin pentaho$ ls -al
    total 23104
    drwxr-s--- 7 pentaho tcgaadmin 32768 2013-01-04 14:20 .
    drwxrws--x 14 dolina tcgaadmin 32768 2013-01-04 13:20 ..
    drwxr-s--- 28 pentaho tcgaadmin 131072 2012-04-20 09:21 CEL
    drwxr-s--- 3 pentaho tcgaadmin 32768 2012-07-27 14:53 cghub
    drwxr-s--- 2 pentaho tcgaadmin 32768 2012-04-20 08:51 downloadlogs
    ###############################################################################################################
    -rw-r----- 1 pentaho tcgaadmin 0 2013-01-04 14:20 foo
    ###############################################################################################################
    drwxr-s--- 28 pentaho tcgaadmin 32768 2012-08-27 10:02 level3
    drwxrws--- 33 pentaho tcgaadmin 32768 2012-11-25 08:01 seq_vs_snpchip
    -rw-r----- 1 pentaho tcgaadmin 23320529 2012-02-07 15:43 xml2process2.txt

    Thus it appears that the stat() information for the directory is missing on the client, perhaps not received correctly from the NFSD on the cNFS node, and clears on its own, especially if you touch a file in the directory.

    There are no GPFS or NFSD errors on the cNFS nodes or GPFS nsdnodes/manager nodes. The customer is using LDAP for authentication on their NFS clients, which is also used by a NetApp filer that doesn't experience this problem. We can reproduce the problem by setting the directory to have read permissions but not execute permissions, but that affects all files/sub-directories, whereas this shows only issues with stat() information on some of the files/sub-directories. The issue occurs for multiple users, in different directories, and from different NFS client mount points and always clears itself without any changes to NFS or GPFS services.

    The GPFS release on cluster is 3.4.0-15, the OS is CentOS 5.5 running with the 2.6.18-194.32.1.el5 kernel.
    The NFS software on the cNFS nodes is:
    nfs-utils-lib-1.0.8-7.6.el5
    nfs-utils-1.0.9-47.el5_5
    The NFS software on the client is:
    nfs4-acl-tools-0.3.3-6.el6.x86_64 Fri 27 Jul 2012 07:59:06 AM EDT
    nfs-utils-1.2.3-26.el6.x86_64 Fri 27 Jul 2012 07:58:38 AM EDT
    nfs-utils-lib-1.1.5-4.el6.x86_64 Mon 18 Jun 2012 07:30:02 AM EDT

    The mount option on the client is:
    gridscaler-pvt.bioinf.unc.edu:/mnt/gs1/export/tier3-tcgarepo on /datastore/tcgarepo type nfs (rw,noatime,intr,timeo=1800,retrans=10,rsize=32768,wsize=32768,addr=172.29.26.132)

    The export option on the cNFS nodes are:
    /mnt/gs1/export/tier3-tcgarepo 172.29.26.150(rw,no_root_squash,sync,fsid=229) 172.29.26.0/255.255.255.0(rw,root_squash,sync,fsid=229) 152.19.180.0/255.255.255.0(rw,root_squash,sync,fsid=229) 152.19.87.108(rw,root_squash,sync,fsid=229) 152.19.9.10(rw,root_squash,sync,fsid=229) 152.19.8.9(rw,root_squash,sync,fsid=229)

    The customer exports many directories from the same file system to many clients, and the exported directory is a GPFS fileset. Quotas are not an issue either.

    Unfortunately the problem always is cleared before any GPFS runtime analysis can be done, such as GPFS waiter collection, or even system related resource monitoring, such as iostat, mpstat, etc. It is also not readily reproducible, so we cannot easily diagnose the problem. The problem definitely does not appear to be load related, but it's hard to rule anything out at this point. We have not found any issue with GPFS itself.

    ANY AND ALL ASSISTANCE IS GREATLY APPRECIATED!!
    -Bryan
    • ezhong
      ezhong
      32 Posts
      ACCEPTED ANSWER

      Re: sporadic permission denied on NFS export

      ‏2013-02-22T21:31:25Z  in response to DDNBryan
      There are so many layers between the user commands and GPFS. There is no evidence yet that this is a GPFS problem. In case it is a GPFS problem, then GPFS trace covering the failure event would probably show the error. This seems to be a hard to debug problem.
      • HajoEhlers
        HajoEhlers
        251 Posts
        ACCEPTED ANSWER

        Re: sporadic permission denied on NFS export

        ‏2013-02-25T13:39:10Z  in response to ezhong
        Just a guess:
        Could it be that the amount of token memory is not sufficient to handle all "stat" cache entries ?

        Meaning the total amount of maxStatCache entries on all NFS server will overrun the memory defined by "tokenMemLimit" to handle stat tokens on the fs manager nodes ?

        Hajo
        • DDNBryan
          DDNBryan
          5 Posts
          ACCEPTED ANSWER

          Re: sporadic permission denied on NFS export

          ‏2013-02-26T23:38:46Z  in response to HajoEhlers
          Hi HajoEhlers,

          Thanks for your note. I checked the current memory usage on the cNFS nodes and they do not show "Token Memory" usage that is close to the default 512MB setting, nor do they show any allocation failures:
          root@gs4 ~# mmdsh -N nsdnodes "mmdiag --memory | grep -A 6 \"Token Manager\"" | grep "bytes in use" -A 1
          gs5-pvt2: 50921008 bytes in use
          gs5-pvt2: 510027355 hard limit on memory usage
          --
          gs12-pvt2: 2590016 bytes in use
          gs12-pvt2: 510027355 hard limit on memory usage
          --
          gs8-pvt2: 2590016 bytes in use
          gs8-pvt2: 510027355 hard limit on memory usage
          --
          gs1-pvt2: 49882256 bytes in use
          gs1-pvt2: 510027355 hard limit on memory usage
          --
          gs0-pvt2: 106492928 bytes in use
          gs0-pvt2: 510027355 hard limit on memory usage
          --
          gs14-pvt2: 109648944 bytes in use
          gs14-pvt2: 510027355 hard limit on memory usage
          --
          gs10-pvt2: 109641264 bytes in use
          gs10-pvt2: 510027355 hard limit on memory usage
          --
          gs4-pvt2: 109633104 bytes in use
          gs4-pvt2: 510027355 hard limit on memory usage

          root@gs4 ~#
          root@gs4 ~# mmdsh -N nsdnodes "mmdiag --memory | grep -A 6 \"Token Manager\"" | grep fail
          gs5-pvt2: 0 allocation failures
          gs12-pvt2: 0 allocation failures
          gs0-pvt2: 0 allocation failures
          gs1-pvt2: 0 allocation failures
          gs4-pvt2: 0 allocation failures
          gs8-pvt2: 0 allocation failures
          gs14-pvt2: 0 allocation failures
          gs10-pvt2: 0 allocation failures

          Thanks for you input! Our current guess is that there is an issue with the older NFSD in the RHEL/CentOS 5.5 release. Does anybody hear of issues like this in the NFSD code?

          Thanks again!!
          -Bryan
          • HajoEhlers
            HajoEhlers
            251 Posts
            ACCEPTED ANSWER

            Re: sporadic permission denied on NFS export

            ‏2013-03-07T14:12:47Z  in response to DDNBryan
            Just a remark:
            In our environment the Token Manager space is setup in such a way that a single token manager could HANDLE all tokens of all nodes.

            In your environment the total amount of Token space ( You have just shown the NSD nodes ) is already larger ( 530MB) then the configured Token space on a given Manager nodes.

            Maybe my approach is not right but it should keep me out of trouble in case of NSD failuers so i have not to worry about token manager space.

            Hajo
          • fleers
            fleers
            24 Posts
            ACCEPTED ANSWER

            Re: sporadic permission denied on NFS export

            ‏2013-04-04T21:06:58Z  in response to DDNBryan
            Hi,
            Just wanted to post a comment here about this issue, and add a recent experience which might help anyone who lands on this thread while searching.
            We recently saw this set of confitions, and in fact I believe this to be the same filesystem that DDNBryan has posted about. Hence the reply to his last thread entry.

            The bottom line is that the combination of linux kernel, capabilities settings, and GPFS portability layer configuration can enable nfsd to re-use UID/GID credentials for GPFS filesystem access instead of switching to the 'real' UID/GID of the actual requestor.

            Starting with kernels 2.6.27, capabilities are enabled by default, and are used instead of switching to the appropriate UID/GID.

            GPFS allows the use of capabilities on kernels 2.6.27 or later, but not before that.

            Our kernel was 2.6.18, but was indeed built with capabilities (CONFIG_SECURITY_FILE_CAPABILITIES=y) note: this kernel config option has since been deprecated.

            Depending on currently installed versions, enabling capabilities in the GPFS portability layer on the cNFS server nodes or upgrading the kernel are potential workarounds.

            Here's a good article on this very site about POSIX file capabilities
            http://www.ibm.com/developerworks/linux/library/l-posixcap/index.html

            -frank