Pinned topic Measuring GPFS Pagepool Consumption
Answered question This question has been answered.
Unanswered question This question has not been answered yet.
While vaildating mission critical application benchmark improvements is perhaps the best way to determine if the memory invested into GPFS pagepool is a good investment, not all applications runing on generic compute nodes within a GPFS cluster will be properly profiled and most will likely exploit pagepool differently. Does GPFS provide a way to extract actual pagepool consumption versus allocation (pinned memory) metrics?
Updated on 2008-06-27T15:03:57Z at 2008-06-27T15:03:57Z by dlmcnabb
Re: Measuring GPFS Pagepool Consumption2008-06-26T07:43:57ZThis is the accepted answer. This is the accepted answer.Since the pagepool is a cache, it only throws things out when something else needs space. Therefore, after filling up, it is always 100% "consumed".
If you want gory details, "mmfsadm dump pgalloc" will show more than anyone really understands, and there is no description of this dumped data. (WARNING: Be careful when using the mmfsadm dump command on an active system, since it may follow a pointer to deallocated space which may cause a SIGSEGV. So only use it when the system is mostly idle.)
gcorneau 0100003X4E162 Posts
Re: Measuring GPFS Pagepool Consumption2008-06-26T13:12:08ZThis is the accepted answer. This is the accepted answer.One other comment. You stated "...actual pagepool consumption versus allocation (pinned memory) metrics" and I thought I'd point out that the GPFS page pool pinned memory is allocated when the GPFS deaemon starts. I.e. GPFS doesn't allocate just some of the pagepool at the beginning and then allocate more as needs increase. You can see the overall memory utilization for GPFS on AIX via "svmon -P <pid_of_mmfsd>"
IBM Power Systems Advanced Technical Support
Re: Measuring GPFS Pagepool Consumption2008-06-27T00:05:49ZThis is the accepted answer. This is the accepted answer.Don't worry Glen, I do not currently have any plans to exploit Loadleveler's llsubmit filter to custom tailor GPFS's pagepool on compute nodes for particular batch jobs that could benefit from a larger than normal pagepool versus those that could not, but...
according to the GPFS documentation pagepool is advertised to be dynamically changeable these days via mmchconfig -i/-I. This would leave me to believe that you can change the value per node without shutting down and restarting the GPFS daemon. Is this true, false, or misleading? Are there caveats for shrinking versus growing?
Over time, the pagepool on a given node is not going to be 100% "hot" for all applications despite being 100% allocated, hence the buffer label descriptors "cold, done, free, hot, and inactive". I agree that mmfsadm probing is far from the preferred solution, but as long as carbon-based units are required to turn the GPFS pagepool/MFTC/MSC cache tuning knobs (as opposed to GPFS doing real-time cache self-optimization) the required information needs to be accessible in order to make tuning decisions as well as to monitor the results after those decisions are implemented.
If mmpmon is the preferred monitoring mechanism, are there plans to incorporate "pagepool, MFTC, and MSC" cache metrics?
Re: Measuring GPFS Pagepool Consumption2008-06-27T01:36:33ZThis is the accepted answer. This is the accepted answer.
- SystemAdmin 110000D4XK
There is a fixed size table of the managed pagepool segments. When new territory is needed, one or more of the slots in this table will be used to describe the new territory. These segment memory areas are never returned to the system, but when you reduce the pagepool size GPFS unpins portions of the pagepool memory and doesn't use those addresses. If you then make the pagepool bigger, GPFS will repin areas it has not used, and once the existing segments are all pinned it will add new segments.
So increasing pagepool 10M at a time will use up the slots fast, whereas increasing pagepool by 1G chunk will make a big segment, which you can relinquish by dropping pagepool size back 1G-10M and then freely bump up and down without taking more slots.
Look at the table at the end of "mmfsadm dump pgalloc" to see the table.
AIX 64bit kernel has 1024 entries and each segment can be 256M for a max pagepool of 256G.
AIX 32bit kernel can only have 8 slots for 2G max.
Linux 64bit machines can have 256 entries and each segment can be 1G for a 256G max.
Linux 32bit machines can have 8 slots for 8G max.
Once all the slots are filled (or pinnable memory all used up), increasing pagepool will just passively return the same answer it had before.
Re: Measuring GPFS Pagepool Consumption2008-06-27T03:23:05ZThis is the accepted answer. This is the accepted answer.
- dlmcnabb 120000P4JT
- GPFS docs suggest that pagepool < 50% physical RAM on Linux. Is this a guide or a hard limit originating from GPFS code? My guess is that the limit ensures that apps running on the node are not starved of memory. Is this correct?
If I have nodes, say with 64GB physical memory and these node do not run anything other than GPFS, can I allot 60GB to GPFS? The OS and daemons need < 500MB (actual measured). So I still have 3+GB free.
PS: I did test a cluster of x3755s with 30GB pagepool/32GB physical and the nodes crashed badly (no data corruption though).
Re: Measuring GPFS Pagepool Consumption2008-06-27T12:30:55ZThis is the accepted answer. This is the accepted answer.
- dlmcnabb 120000P4JT
Set the permanent configuration to specify the largest pagepool you would use ever.
This sets up the segments describing the pagepool during daemon start up. Then in the /var/mmfs/etc/mmfsup user configurable script, immediately drop the pagepool for the node down to its normal level using
mmchconfig pagepool=$normalpagepool -I -N $thisnode
The -I will change the daemon value but not change the permanent configuration setting.
Then when you want to change the pagepool for a specific set of nodes:
mmchconfig pagepool=$newpagepoolvalue -I -N $node1,$node2,...,$nodeN
The effect of changing the pagepool in this case will be just to pin/unpin parts of the pagepool as needed. Note that when unpinning, the contents of that memory are basically thrown out of the cache (flushed to disk if necessary), so would have to be read back in if actively being used.
Re: Measuring GPFS Pagepool Consumption2008-06-27T15:03:57ZThis is the accepted answer. This is the accepted answer.
- SystemAdmin 110000D4XK
In release 3.1 GPFS will try and allocate/pin whatever you specify until the operating system refuses.
Either case may also totally use up the memory the operating system if it has no space left when other applications pin memory. So the documentation is a guideline, but use your best judgment.