Topic
  • 6 replies
  • Latest Post - ‏2012-10-29T16:13:32Z by dlmcnabb
SystemAdmin
SystemAdmin
2092 Posts

Pinned topic GPFS pagepool eviction behavior

‏2012-10-25T15:22:04Z |
We have a file system with a 256K block size. Client nodes have 8GB pagepool caches. We have been questioning how the cache behaves due to jobs run slower than expected. Via strace, we see data recently referenced is no longer in the cache due to the longer response time for the read. There were typically IOs of 32k-64k in size.

I put together a program that performs a number of random IOs to a large file. Each IO is done 3 times and response time is measured. Typically the first IO is 5-10 milliseconds and the next IOs are under 100 microseconds indicating the IO was satisfied from the cache. The program then reads back the data with a delay between each read and reports the response time. Thus we can see when data falls out of the cache. We have found on an active system that IOs of between 128k and 256k tend to live in the cache for several minutes. However IOs smaller than 128k often are evicted from the cache within a few seconds. This behavior is not seen all the time, but it is present quite frequently. There is no other usage of the file except on the single node running the test and only reads are done to the file. Could someone provide a description of how the cache behaves and manages the buffers in pagepool?

If we used a file system with a 64k block size would we see IOs smaller than 32k potentially have a short life span in the cache?
Updated on 2012-10-29T16:13:32Z at 2012-10-29T16:13:32Z by dlmcnabb
  • HajoEhlers
    HajoEhlers
    253 Posts

    Re: GPFS pagepool eviction behavior

    ‏2012-10-25T16:47:39Z  
    Have you read:

    http://www.ibm.com/developerworks/wikis/display/hpccentral/GPFS+Tuning+Parameters#GPFSTuningParameters-seqDiscardThreshold

    Extract:

    seqDiscardThreshold
    ...
    Increasing seqDiscardthreshold tells GPFS to attempt to keep as much data in cache as possible for the files below that threshold.

    Tuning Guidelines
    - Increase this value if you want to cache files, that are sequentially read or written, that are larger than 1MB in size.
    - Make sure there are enough buffer descriptors to cache the file data. (See maxBufferDescs )

    You might check writebehindThreshold as well
    Extract:
    ....
    The writebehindThreshold parameter determines at what point GPFS starts flushing newly written data out of the pagepool for a file....
    As a default, GPFS uses pagepool for buffering IO for best performance but once the data is written the buffers are cleaned

    Hm, this raises the question: Even if seqDiscardThreshold is large enough to hold a complete file, will the file be purged from memory after it has been written ? Thus only an additional read will get the file back into the buffer ?
    1) create file : Everthing is buffered until writebehindThreshold is reached. Then the file is purged from the pagepool.
    2) read file : File will be read from disk and copied into pagepool and kept there as long file size < seqDiscardThreshold
    3) reread file: Read is done from pagepool

    And does this happen even if filesize < writebehindThreshold since then the sync process starts flushing ?

    cheers
    Hajo
  • dlmcnabb
    dlmcnabb
    1012 Posts

    Re: GPFS pagepool eviction behavior

    ‏2012-10-25T16:53:59Z  
    If you are running on any of the PTFs 3.4.0.11-3.4.0.16 you can have various different performance problems with sequential or random requests. Install 3.4.0.17 to see if the performance problems persist. If you are running 3.5.0.1-3.5.0.4, upgrade to 3.5.0.5 that is coming out next week.
  • SystemAdmin
    SystemAdmin
    2092 Posts

    Re: GPFS pagepool eviction behavior

    ‏2012-10-25T20:40:26Z  
    Neither seqDiscardThreshold or writebheindThreshold appear to apply since the IOs are random and there is no write activity to the file.

    Our current version of GPFS is 3.4.0-15. I've not seen anything specific in the change log for 3.4.0-17 (or 3.4.0-16) that indicate changes to the caching behavior.
  • dlmcnabb
    dlmcnabb
    1012 Posts

    Re: GPFS pagepool eviction behavior

    ‏2012-10-25T20:48:41Z  
    Neither seqDiscardThreshold or writebheindThreshold appear to apply since the IOs are random and there is no write activity to the file.

    Our current version of GPFS is 3.4.0-15. I've not seen anything specific in the change log for 3.4.0-17 (or 3.4.0-16) that indicate changes to the caching behavior.
    Defect 854889 in 3.4.0.17 is for random IO performance. It appears that nothing was added in the change log for this fix.
  • vladimir_cnaf
    vladimir_cnaf
    60 Posts

    Re: GPFS pagepool eviction behavior

    ‏2012-10-29T13:37:41Z  
    • dlmcnabb
    • ‏2012-10-25T20:48:41Z
    Defect 854889 in 3.4.0.17 is for random IO performance. It appears that nothing was added in the change log for this fix.
    So, has been this problem in random IO performance already resolved (but not documented) in 3.4.0-17 or we need to wait for the 3.4.0-18?
  • dlmcnabb
    dlmcnabb
    1012 Posts

    Re: GPFS pagepool eviction behavior

    ‏2012-10-29T16:13:32Z  
    So, has been this problem in random IO performance already resolved (but not documented) in 3.4.0-17 or we need to wait for the 3.4.0-18?
    It is resolved in 3.4.0.17