Topic
IC4NOTICE: developerWorks Community will be offline May 29-30, 2015 while we upgrade to the latest version of IBM Connections. For more information, read our upgrade FAQ.
13 replies Latest Post - ‏2013-03-18T17:42:41Z by HajoEhlers
HajoEhlers
HajoEhlers
251 Posts
ACCEPTED ANSWER

Pinned topic mmdeldisk fails with no space left on device ( Files not referenced by any

‏2013-03-11T12:06:41Z |
Hi

Task: Removing disk from a storage pool

Given: GPFS v3.4

Problem:
mmdeldisk ends with " No space left on device" during the removal of the last disk of the pool.

Analyse:
A list policy does not show any file left on the given storage pool. Thus i assume that some files are not referenced by any directory any more.

I run a trace which show some inodes: Example for a given inode 329763

309  38666315    0.861233074       0.000299   MMFS KSVFS: lookupFileFast exit: blk 0 dirHashLevelHint 0 inodeNum 329763 snapId 0 generation 0x00000002 mode 0x 00004000 err 0 309  38666315    0.861235292       0.000906   MMFS VNODE: gpfsNode_t::findOrCreate enter: privVfsP 0xF1000A04007B2680 sgLocalId 18 inodeNum 329763 generation 0x00000002 filesetId 0 snapId 0 vPP 0xF00000002FF469C8 gnPP 0xF00000002FF46768 309  38666315    0.861239027       0.000297   MMFS VNODE: gpfsNode_t::findOrCreate exit: err 0 code 0 gnP 0xF1000A0400763E00 vP 0xF1000A0400264D90 inodeNum 32 9763 filesetId 0 309  38666315    0.861242042       0.000156   MMFS VNOP: LOOKUP: vP 0xF1000A0400264D90 gnP 0xF1000A0400763E00 inode 329763 snap 0 dvP 0xF1000A0400264490 name 
'hjehlers' 309  38666315    0.861254917       0.000218   MMFS KSVFS: kSFSGetattr enter: sgP 0xF100000693610148 genNum 0x00010002 inodeNum 329763 snapId 0 flags 0x0000000 4 309  38666315    0.861268386       0.000201   MMFS KSVFS: kSFSGetattr exit: va_size 8192 va_ino 0x0000000000050823 inodeNum 329763 err 0 309  38666315    0.861269105       0.000445   MMFS VNOP: GETVATTR: gnP 0xF1000A0400763E00 inode 329763 type 2 mode 0x000041ED uid 66676 gid 207 nlink 48 size 0x0000000000002000 blks 16 gen 0x00010002 rdev 0x8000000000000000 err 0 307  38666315    0.861274154       0.000291   MMFS VNODE: vnodeReleInternal: release vP 0xF1000A0400264D90 

new vpcnt 3 inodeNum 329763 NFSP 0x0000000000000000 309  38666315    0.861295074       0.000293   MMFS VNOP: LOOKUP: vP 0xF1000A0400264D90 gnP 0xF1000A0400763E00 inode 329763 snap 0 dvP 0xF1000A0400264D90 name 
'.'

Question:
Is it possible to get the data referenced by a given inode number without unmounting the FS and running a fsck ?
If so it would allow me to determine if the "orphaned" inode contains important data and if not to force the removal of the disk.
Meaning no downtime is required at all.

tia
Hajo
Updated on 2013-03-18T17:42:41Z at 2013-03-18T17:42:41Z by HajoEhlers
  • YuanZhengcai
    YuanZhengcai
    9 Posts
    ACCEPTED ANSWER

    Re: mmdeldisk fails with no space left on device ( Files not referenced by any

    ‏2013-03-11T15:37:37Z  in response to HajoEhlers
    To delete the last disk in one pool, we need to migrate all data off this disk, did you run it before the mmdeldisk ? Is there snapshot in your file system? To list which file has data left in this disk, you can use mmfileid. For example "mmfileid -d gpfs1nsd".
    • HajoEhlers
      HajoEhlers
      251 Posts
      ACCEPTED ANSWER

      Re: mmdeldisk fails with no space left on device ( Files not referenced by any

      ‏2013-03-11T17:21:52Z  in response to YuanZhengcai
      I did a migration before and there is no snapshot. Also the disk (should) holds only Data. ( DataOnly disk )

      The current result from the mmfileid is:
      
      Address 15:0-4099 contains data in the system area of the disk
      
      • SystemAdmin
        SystemAdmin
        2092 Posts
        ACCEPTED ANSWER

        Re: mmdeldisk fails with no space left on device ( Files not referenced by any

        ‏2013-03-11T17:39:21Z  in response to HajoEhlers
        It is odd that mmfileid shows no inodes with blocks on the problematic disk, while mmdeldisk apparently finds something -- both use the same basic approach, a full metadata scan, to find all references to a given disk. The metadata scan is done on the inode file, so the directory structure is not relevant, and an orphaned file would be processed just as a properly connected one would.

        This may need some targeted debug data collection (traces, etc) to figure out, which would be best done via a PMR. One quick thing to try would be running tsinode sample (from /usr/lpp/mmfs/samples/util) and see if any inodes reference the offending data pool name.

        yuri
        • HajoEhlers
          HajoEhlers
          251 Posts
          ACCEPTED ANSWER

          Re: mmdeldisk fails with no space left on device ( Files not referenced by any

          ‏2013-03-12T09:17:39Z  in response to SystemAdmin
          I will open a PMR since the GPFS level is already 3.4.0.19 and post the result.

          Note:
          For the build of the tsinode i get error messages like:
          
          cc /usr/lpp/mmfs/samples/util/tsinode.c  -qinclude=/usr/lpp/mmfs/include/gpfs.h  -o tsinode
          
          
          
          "/usr/lpp/mmfs/samples/util/tsinode.c", line 250.36: 1506-022 (S) 
          "ia_dev" is not a member of 
          "const struct gpfs_iattr". 
          "/usr/lpp/mmfs/samples/util/tsinode.c", line 251.36: 1506-022 (S) 
          "ia_dev" is not a member of 
          "const struct gpfs_iattr". 
          "/usr/lpp/mmfs/samples/util/tsinode.c", line 255.17: 1506-022 (S) 
          "ia_winflags" is not a member of 
          "const struct gpfs_iattr". 
          "/usr/lpp/mmfs/samples/util/tsinode.c", line 257.15: 1506-022 (S) 
          "ia_winflags" is not a member of 
          "const struct gpfs_iattr". ...
          
          • SystemAdmin
            SystemAdmin
            2092 Posts
            ACCEPTED ANSWER

            Re: mmdeldisk fails with no space left on device ( Files not referenced by any

            ‏2013-03-12T15:41:21Z  in response to HajoEhlers
            > cc /usr/lpp/mmfs/samples/util/tsinode.c -qinclude=/usr/lpp/mmfs/include/gpfs.h -o tsinode

            That's not the right way to build tsinode. I'd suggest "cd /usr/lpp/mmfs/samples/util; make tsinode". The makefile-prescribed build command enables an extra define and links in libgpfs.

            yuri
        • HajoEhlers
          HajoEhlers
          251 Posts
          ACCEPTED ANSWER

          Re: mmdeldisk fails with no space left on device ( Files not referenced by any

          ‏2013-03-13T17:14:13Z  in response to SystemAdmin
          I run a tsinode and found that some data is left: ( FC10K is the storage pool with the last disk. }

          
          inode       gen    uid    gid       size      space       mode     nlink          atime                mtime               ctime        flags $ cat  tsinode.out | grep FC10K 120938     65540   1306    207          0 0 srwxrwxr-x         1 1226413249.838569981 1226413249.838569981 1226413249.838569981 replmeta datapool=
          'FC10K' dev=0,154 archive crtime 1226413249.838569981 215461     65577  66838    207          0          0 prw-r--r--         1 1271938463.048283783 1271938463.048283783 1271938463.048283783 replmeta datapool=
          'FC10K' dev=0,154 archive crtime 1271938396.622567769 376356     65603   1506    207          0          0 srwxr-xr-x         1 1241099898.736878773 1241099898.736878773 1241099898.736878773 replmeta datapool=
          'FC10K' dev=0,154 archive crtime 1241099898.736878773   1984288     65546  66720    207      18763      24576 -rw-r--r--         0 1361433954.018336113 1360933043.617892730 1361433954.072618865 extperms=0x10,rp replmeta exposed illReplicated datapool=
          'FC10K' dev=0,154 archive crtime 1360933043.568422275   2280462     65545  66838    207          0          0 prw-r--r--         1 1271938396.668755380 1271938463.088768152 1271938463.088768152 replmeta datapool=
          'FC10K' dev=0,154 archive crtime 1271938396.622466982 ... 25987481     65544  66721    207          0          0 srw-------         1 1341476355.167915187 1341476355.167915187 1341476355.225149230 replmeta datapool=
          'FC10K' dev=0,154 archive crtime 1341476355.167915187   26171557     65543  66684    207   61076082   61079552 -rw-r--r--         1 1233565125.000000000 1233565134.000000000 1363145369.616640244 extperms=0x12,xa,rp replmeta datapool=
          'FC10K' dev=0,154 archive crtime 1363145368.988344003 26171632     65543  66684    207        907       8192 -rw-r--r--         1 1363161116.928992687 1231148061.000000000 1363145347.594310859 extperms=0x12,xa,rp replmeta datapool=
          'FC10K' dev=0,154 archive crtime 1363145347.576494109 ...
          


          As we see we have some data left.
          • SystemAdmin
            SystemAdmin
            2092 Posts
            ACCEPTED ANSWER

            Re: mmdeldisk fails with no space left on device ( Files not referenced by any

            ‏2013-03-13T20:46:02Z  in response to HajoEhlers
            Evidently the last mmapplypolicy scan hasn't migrated everything. I'd say run the migrate again, and monitor the output closely, and cross-reference it with tsinode output. I'd be interesting to see whether inodes in 'FC10K' are somehow being skipped.

            As far as practical workarounds go, if the number of inodes still in 'FC10K' is manageable, you could use tsfindinode/mmchattr to move them to another pool. mmdeldisk and mmrestripefs wouldn't help until all inodes are migrated.

            yuri
            • HajoEhlers
              HajoEhlers
              251 Posts
              ACCEPTED ANSWER

              Re: mmdeldisk fails with no space left on device ( Files not referenced by any

              ‏2013-03-14T17:19:40Z  in response to SystemAdmin
              A policy run with -L 6 gives:

              $cat policy.log | grep -w "[E]"
              
              <7> [E] Error on gpfs_igetstoragepool(4294967295): Invalid argument <3> [E] Error on gpfs_igetstoragepool(4294967295): Invalid argument <1> [E] Error on gpfs_igetstoragepool(4294967295): A system call received a parameter that is not valid. <1> [E] Error on gpfs_igetstoragepool(4294967295): A system call received a parameter that is not valid.
              


              Looking for the 4294967295 in the policy log.
              $ grep 4294967295 policy.log
              
              policy.log:<7> [E] Error on gpfs_igetstoragepool(4294967295): Invalid argument policy.log:<7> .../tmtcTrace.fifo  [2008-01-22@13:19:16 66733 207 0 4294967295 2008-04-07@11:13:03 0 root] NO RULE APPLIES policy.log:<3> [E] Error on gpfs_igetstoragepool(4294967295): Invalid argument policy.log:<3> .../TMTCinput.fifo    [2008-04-04@11:23:23 66813 207 0 4294967295 2008-04-07@12:15:33 0 root] NO RULE APPLIES ... ...
              


              The given files are named pipes created in 2008 ( In think we run GPFS 3.1 at that time )
              Conclusion:
              - The "tsinode" allowed us to find stuff which was still referenced by the wrong storage pool and which was also invalid.

              $ cat tsinode.out | grep -iw invalid
              
              7875174     65536  66733    207          0          0 prw-rw-rw-         1 1207566813.000000000 1202976520.000000000 1207566813.001154414 replmeta gpfs_igetstoragepool:Invalid argument dev=0,154 crtime         0.000000000 7875275     65536  66733    207          0          0 prw-------         1 1207566814.000000000 1195489892.000000000 1207566814.383185208 replmeta gpfs_igetstoragepool:Invalid argument dev=0,154 crtime         0.000000000 8346458     65536  66733    207          0          0 prw-------         1 1207566783.000000000 1201007956.000000000 1207566783.419308257 replmeta gpfs_igetstoragepool:Invalid argument dev=0,154 crtime         0.000000000 10982546     65536  66813    207          0          0 prw-rw-rw-         1 1207570533.000000000 1207308203.000000000 1207570533.652202369 replmeta gpfs_igetstoragepool:Invalid argument dev=0,154 crtime         0.000000000
              

              - A "mmapplypolicy ... -L 6 > policy.out 2>&1 " with an later grep "[E]\" showed us ERROR messages where the gpfs_igetstoragepool number pointed us to some old fifo files.

              - A recreation ( rm/mknod) of the given named pipes resolved the problem meaning the mmdeldisk run successfully.

              So the root cause seems to be named pipes created during the usage of an ancient version of GPFS.

              Thanks to all.
              Hajo
              • SystemAdmin
                SystemAdmin
                2092 Posts
                ACCEPTED ANSWER

                Re: mmdeldisk fails with no space left on device ( Files not referenced by any

                ‏2013-03-15T05:54:52Z  in response to HajoEhlers
                Interesting. It makes sense that recreating named pipes helped: this effectively changes the storage pool assignment for their inodes (because of the use of newer version of GPFS, which assigns non-regular files to the system pool, and if even if that wasn't the case, your placement policy wouldn't put them in 'FC10K' anymore). It sort of makes sense that something relatively rare like a named pipe would be problematic. As Dan pointed out, there was a recent bugfix related to the handling of special files in mmdeldisk, so similar trouble has been seen elsewhere. However, tsinode results quoted in one of your previous posts showed some regular files, e.g. inodes 1984288 and 26171557, which somehow didn't get migrated the first time around. Were they properly migrated during the last mmapplypolicy run? It's important to know, because as of 3.4.0.16, non-regular files with pool ID set to the pool being deleted no longer make mmdeldisk fail with ENOSPC.

                yuri
                • HajoEhlers
                  HajoEhlers
                  251 Posts
                  ACCEPTED ANSWER

                  Re: mmdeldisk fails with no space left on device ( Files not referenced by any

                  ‏2013-03-15T08:47:15Z  in response to SystemAdmin
                  Since i did no run a mmapplypolicy after the recreation of the fifo files AND the mmdeldisk run without problem i assume that the policy run for our storage pool FC10K was able to migrate all data to the new storage pool but was not able to update the required info on our storage pool FC10K in case it encountered an error.

                  The mmdeldisk was not able to check all inodes since it stops working as soon as it encounters an error.
                  Meaning it would make a difference if i used all the time only a single node or multiply nodes for the mmdeldisk.
                  Or in other words: A mmdeldisk using a single node to work with would probably stop working at exact the same inode with exact the same amount of data left on the disk where a mmdeldisk using multiply nodes could "release" more data.(*)
                  (*) under the assumption that the the policy migration was successful in such a way that the data was available also on the new storage pool.

                  Hajo
                  • SystemAdmin
                    SystemAdmin
                    2092 Posts
                    ACCEPTED ANSWER

                    Re: mmdeldisk fails with no space left on device ( Files not referenced by any

                    ‏2013-03-15T15:48:47Z  in response to HajoEhlers
                    No, the behavior of mmdeldisk is not the problem. This code actually differentiates between "severe" errors and "fatal" errors: the former will prevent the overall operation from being a success but don't stop processing immediately, while the latter mean there's no way to proceed. However, error handling in mmdeldisk is not the issue. mmdeldisk can't do anything about an inode that is still assigned to the storage pool being deleted. How would it know where to move the blocks? This requires having a pool migration policy, which of course is unavailable to mmdeldisk. In principle, mmdeldisk could be modified to have some kind of pool reassignment logic, and we've discussed implementing this, but have decided against it, at least for the time being. So all inodes must be reassigned to a different pool prior to mmdeldisk (it's OK to let mmdeldisk actually move blocks, as long as the pool ID is changed in the inode). This is normally done via mmapplypolicy or mmchattr. That's why I'm curious what the outcome of mmapplypolicy was for regular files. Evidently the original mmapplypolicy run has left some regular files in 'FC10K', which is not OK. It's possible that errors from processing named pipes have contributed to this. So it's important to understand what happened to those regular files during the second mmapplypolicy run. If you have the output from those mmapplypolicy runs saved, it may have some clues.

                    yuri
                    • HajoEhlers
                      HajoEhlers
                      251 Posts
                      ACCEPTED ANSWER

                      Re: mmdeldisk fails with no space left on device ( Files not referenced by any

                      ‏2013-03-18T17:42:41Z  in response to SystemAdmin
                      > If you have the output from those mmapplypolicy runs saved, it may have some clues
                      Sorry, they are not available . I had my focus all the time on the mmdeldisk command.

                      A trace of a failed mmdeldisk run is available via PMR 14647,075,724

                      Hajo
      • dlmcnabb
        dlmcnabb
        1012 Posts
        ACCEPTED ANSWER

        Re: mmdeldisk fails with no space left on device ( Files not referenced by any

        ‏2013-03-11T18:52:35Z  in response to HajoEhlers
        There was a bug where special files (fifo, pipe, chr, blk) were marked as being in a pool, but they actually have no data blocks.

        mmdeldisk was fixed to ignore the special files in defect 843083 last year. This fix was included in 3.3.0.25, 3.4.0.16 and 3.5.0.3.