1 reply Latest Post - ‏2013-09-13T21:19:02Z by dlmcnabb
6 Posts

Pinned topic Delayed Listing Issue

‏2013-09-13T19:19:11Z |

We have an issue where an ls from a group of dedicated ingest servers performs poorly doing an ls



# time ls -latrh | wc -l
real    0m28.306s
user    0m0.012s
sys     0m0.873s


Restarting GPFS appears to fix, and no hanging waiters are present the command is waiting on.  An strace also shows nothing hanging just running slow.  

When looking at `top`, we see the mmfsd CPU% uses ~200% and gpfsSwapdKproc goes to ~25%.


Appear the slowness is due to number of open files GPFS thinks are open.  I'm wondering if someone can help explain this part of mmfsadm dump fs

InodePrefetchInstance dump
  active 2 paused 2 threads 1 (needed 1 wanted 8)
  0: inum 80900111, canceled 0, list 0x0, block prev -1 cur 1 next 524288, eof 0
     paused 1, work 0, pending blk 0 inode 0, fetched 325 used 55 (xw 0), fetchXW 0 fetchXattrs 0
  1: inum 80900111, canceled 0, list 0x0, block prev -1 cur 1 next 524288, eof 0
     paused 1, work 0, pending blk 0 inode 0, fetched 325 used 151 (xw 0), fetchXW 0 fetchXattrs 0
  2: (inactive)
  3: (inactive)
  4: (inactive)
  5: (inactive)
  6: (inactive)
  7: (inactive)
  nStarted 76903 (xw 0) nCompleted 64954 nCanceled 11947 nDirJump 0 nTooMany 0
  nPause 23013 nResume 18754 totalFetched 8055839 (xw 0) totalUsed 3718837 (xw 0) totalDirFetched 0
  threshold 5 Window 500 ms, fetchFirstDirblock 0,  xw-threshold -1/0%, MaxPauseTime 300 sec

Why would 2 instances be paused?  Also, have been looking at this and moved traffic away (just not restarted gpfs) and this isn't changing or clearing out.



  • dlmcnabb
    1012 Posts

    Re: Delayed Listing Issue

    ‏2013-09-13T21:19:02Z  in response to Ponch

    That dump fs data is talking about prefetching of directory inodes. When GPFS detects the pattern of readdir followed by many stat calls, it kicks off asynchronous threads that looks through the directory block and prefetches the inodes of those directory entries. This is just keeping track of those prefetch thread work assignments.

    Slow performance of an ls (readdir) is usually caused because other nodes may be updating a directory and therefore the directory blocks are being thrashed through the disk between the reader nodes and the modifier nodes.  The readers cache is constantly being discarded.

    The next slow case is if there was a directory that had an extremely large number of files so that the directory was made of up hundreds or thousands of directory blocks. GPFS does not compress the blocks. So if most of the directory entries were deleted, an ls command still has to read all the blocks to find what few entries are left. If you do not expect the directory to get repopulated with many files, it would be best then to recreate the directory and move over all the stuff left in the old directory.