• 2 replies
  • Latest Post - ‏2014-05-26T11:41:10Z by YannickBergeron
35 Posts

Pinned topic GPFS 3.5, pagepool and nsdBufSpace

‏2014-05-23T19:33:23Z |


we've a cluster since... 2011 which is a 5 node quorum, 4 of them with most of the NSD. The fifth node is mainly a tie breaker with NSD only used for desc.

The cluster started on GPFS 3.4 and is now 3.5.11 (edit: 3.5.13 finally)

We also have 6 GPFS client.

In the past 2 weeks, we've moved a lot of data around our NSD because we replaced our previous LUNs with new one. Around 40TB of data moved with mmadddisk and mmdeldisk. Wednesday, we had a first occurence of mmfsd crashing on one of the quorum/nsd (which was also FS mgr of 1 of our 4 filesystems). Had +/- 1min20 of impact but everything went back to normal without manual intervention.

Thursday evening, while some mmrestripe where running, the same thing happened on 3 of the quorum (so we lost our quorum) and we had an impact of +/- 5min30. Again, everything went back to normal without manual intervention.


In our investigation, we've found this error in the mmfs log

Thu May 22 21:15:53.870 2014: GPFS: 6027-611 Recovery: mfg2, delay 10 sec. for safe recovery.
Thu May 22 21:16:31.386 2014: The pagepool size may be too small.  Try increasing the pagepool size or adjusting pagepool usage with, for example, nsdBufSpace, nsdRAIDBufferPoolSizePct, or verbsSendBufferMemoryMB).
Thu May 22 21:16:31.396 2014: logAssertFailed: !"More than 22 minutes searching for a free buffer in the pagepool"
Thu May 22 21:16:31.408 2014: return code 0, reason code 0, log record tag 0
The assert subroutine failed: !"More than 22 minutes searching for a free buffer in the pagepool", file ../../../../../../../src/avs/fs/mmfs/ts/bufmgr/bufmgr.C, line 519



We had our GPFS servers with a 128MB pagepool, 3 of our GPFS clients with a 1GB pagepool and the other at 128MB also.

So we thought of increasing it to 1GB but more investigation let us thing that with nsdBufSpace at 30%, we would need a pagepool of 3GB on the GPFS server. This is 70% of 3GB pretty much useless as this remaining pagepool would be usuable for GPFS cache but there is not much client activity on them. So we are thinking about using a 1GB pagepool but with a 90% nsdBufSpace.

What are your thought?


Also we had the surprise today to have the same kind of crash on one of our GPFS client. We thought the issue would only by on the NSD server... so we are kinda confused.


Best regards,


Yannick Bergeron

Updated on 2014-05-26T11:41:30Z at 2014-05-26T11:41:30Z by YannickBergeron
  • yuri
    305 Posts

    Re: GPFS 3.5, pagepool and nsdBufSpace


    This isn't necessarily something that can be entirely solved with tuning.  It's true than 128M pagepool size isn't a big size, by modern standards.  Increasing it to 1G would be a good idea regardless.  I don't think you need to go as high as 3G in your case.  However, I would recommend opening a PMR and uploading a gpfs.snap package, so that the logs and the internal dumps can be analyzed.  There may be a bug in there. is a bit old, at this point, so this may be a known issue.


  • YannickBergeron
    35 Posts

    Re: GPFS 3.5, pagepool and nsdBufSpace


    you probably mean and not

    I've double checked and we're at, not