Topic
4 replies Latest Post - ‏2013-05-06T12:52:36Z by jneuenschwander
SystemAdmin
SystemAdmin
2092 Posts
ACCEPTED ANSWER

Pinned topic Slow stat calls

‏2013-04-02T13:43:41Z |
I've got a (3) node 3.4.0-15 GPFS cluster which clients connect to via NFS. The load is balanced between the nodes using a DNS Round Robin name.
The issue I'm seeing is that stat calls are taking a long time to complete. The issue doesn't affect all the nodes at the same time, and it will move between the different nodes in the cluster. All nodes in the cluster have been affected at one point or another. The issue happens locally on the GPFS node and over the NFS shares.

Below is the output of mmdiag --stats from the nodes in the cluster. I was looking for some guidance on tuning GPFS would be beneficial or should I be looking at something else to resolve the problem.
Running mmdiag --stats on 192.168.16.101

=== mmdiag: stats ===
Global resources:
OpenFile counts: total created 51306 (in use 51232, free 74)
using 124477K memory
cached 51218, currently open 56+6, cache limit 51200 (min 10, max 51200), eff limit 51200
stats: steals 70772660 (clean 68486957, dirty 2285703)
StatCache counts: total created 5127 (in use 5118, free 9)
using 1559K memory
cache limit 5120
stats: inserts 74071938 steals 62362195 hits 73 expands 11348461 revokes 288116 uses 7636620
OpenInstance counts: total created 3368 (in use 1712, free 1656)
using 882K memory
BufferDesc counts: total created 1479 (in use 1467, free 12)
using 1123K memory
cached 1467 cache limit 117952 prefetch 4
indBlockDesc counts: total created 51384 (in use 51367, free 17)
using 735K memory
cached 51367 cache limit 51200
Running mmdiag --stats on 192.168.16.102

=== mmdiag: stats ===
Global resources:
OpenFile counts: total created 51268 (in use 51200, free 68)
using 124400K memory
cached 51200, currently open 73+6, cache limit 51200 (min 10, max 51200), eff limit 51200
stats: steals 95125944 (clean 91422145, dirty 3703799)
StatCache counts: total created 5128 (in use 5120, free 8)
using 1560K memory
cache limit 5120
stats: inserts 102858889 steals 74962503 hits 387 expands 27604720 revokes 229791 uses 22408503
OpenInstance counts: total created 4297 (in use 2186, free 2111)
using 1127K memory
BufferDesc counts: total created 31533 (in use 31503, free 30)
using 1434K memory
cached 31503 cache limit 117952 prefetch 0
indBlockDesc counts: total created 45154 (in use 45138, free 16)
using 939K memory
cached 45138 cache limit 51200
Running mmdiag --stats on 192.168.16.103

=== mmdiag: stats ===
Global resources:
OpenFile counts: total created 51303 (in use 51243, free 60)
using 124504K memory
cached 51232, currently open 58+6, cache limit 51200 (min 10, max 51200), eff limit 51200
stats: steals 361321889 (clean 360231902, dirty 1089987)
StatCache counts: total created 5128 (in use 5094, free 34)
using 1552K memory
cache limit 5120
stats: inserts 362955015 steals 359499961 hits 21 expands 3325910 revokes 62623 uses 2706985
OpenInstance counts: total created 4686 (in use 2372, free 2314)
using 1223K memory
BufferDesc counts: total created 8818 (in use 8813, free 5)
using 1556K memory
cached 8813 cache limit 117952 prefetch 0
indBlockDesc counts: total created 51387 (in use 51382, free 5)
using 1019K memory
cached 51382 cache limit 51200

Thanks,
Johnathon Neuenschwander
Updated on 2013-04-03T12:35:18Z at 2013-04-03T12:35:18Z by HajoEhlers
  • SystemAdmin
    SystemAdmin
    2092 Posts
    ACCEPTED ANSWER

    Re: Slow stat calls

    ‏2013-04-02T17:59:43Z  in response to SystemAdmin
    How are you invoking the stat calls? Could you give an example of slow stat and a regular stat call?

    For performance problems like this, additional system configuration data are needed. If you have GPFS service contract, it is faster to work through IBM services to collect the relevant data for analysis.
    • SystemAdmin
      SystemAdmin
      2092 Posts
      ACCEPTED ANSWER

      Re: Slow stat calls

      ‏2013-04-03T12:27:35Z  in response to SystemAdmin
      Thanks for the response.

      Anything that generates stat calls will run slow. I ran the following command on all nodes at the same time.
      The issue isn't related to memory or processor. The .103 node had the greatest load on it and it responded the quickest.

      time ls -l /fs0/shares/net_www/homes/

      192.168.16.101
      real 0m1.566s
      user 0m0.001s
      sys 0m0.012

      192.168.16.102
      real 0m0.049s
      user 0m0.004s
      sys 0m0.003s

      192.168.16.103
      real 0m0.017s
      user 0m0.002s
      sys 0m0.005s
  • HajoEhlers
    HajoEhlers
    251 Posts
    ACCEPTED ANSWER

    Re: Slow stat calls

    ‏2013-04-03T12:35:18Z  in response to SystemAdmin
    If i understand the mmdiag output correctly:

    Your maxFilesToCache is set to 51200
    Your maxStatCache is set to 5120

    Your Stat hit count if very low compared to high inserts/steals
    ...
    stats: inserts 74071938 steals 62362195 ---->>>> hits 73 <<<<---- expands 11348461 revokes 288116 uses 7636620
    ...

    So your maxStatCache is 10 times less than the maxFilesToCache where it should be IMHO the otherway around:

    Meaning:

    maxFilesToCache 5120
    maxStatCache 51200
    You might check your FS also for large directories with an large amount of entries ( > 30K ) since a lookup could trash your StatCache.

    Note:
    For our NFS server we use a maxStatCache of 500000 which gives:
    
    === mmdiag: stats === Global resources: OpenFile counts: total created 5031 (in use 5000, free 31) using 12148K memory cached 5000, currently open 354+33, cache limit 5000 (min 10, max 5000), eff limit 5000 stats: steals 1001848127 (clean 1000193464, dirty 1654663) StatCache counts: total created 500007 (in use 499935, free 72) using 152323K memory cache limit 500000 stats: inserts 1193054213 steals 756040290 hits 833308072 expands 411521473 revokes 22470563 uses 346104567 OpenInstance counts: total created 4097 (in use 2228, free 1869) using 1148K memory BufferDesc counts: total created 5340 (in use 5340, free 0) using 1479K memory cached 5340 cache limit 50000 prefetch 300 indBlockDesc counts: total created 5205 (in use 5201, free 4) using 957K memory cached 5201 cache limit 5000
    


    hth
    Hajo
    • jneuenschwander
      jneuenschwander
      1 Post
      ACCEPTED ANSWER

      Re: Slow stat calls

      ‏2013-05-06T12:52:36Z  in response to HajoEhlers

      Thanks all for your help.

      We were able to resolve the issue.  We increased the maxStatCache and pagepool.

      We also updated GPFS to 3.4.0-16.

       

      Thanks,

      Johnathon