Topic
  • 4 replies
  • Latest Post - ‏2010-11-22T08:19:28Z by sdesmet
sdesmet
sdesmet
7 Posts

Pinned topic nfsPrefetchStrategy and GPFS 3.4.0

‏2010-11-17T13:00:45Z |
We have 2 clusters running, one with GPFS 3.3.0-6 and another with GPFS 3.4.0-2. I have 2 NSD client nodes, again one with GPFS 3.3.0-6 and another with GPFS 3.4.0-2. If I export a GPFS fs over plain NFS to a Mac OSX client, GPFS doesn't prefetch at all on the 3.4.0-2(even if this node sits in the 3.3.0-6 cluster) if we have clients reading video files(so there is a little bit of jumping around in the file, that's why we put nfsPrefetchStrategy to 3), and I get very bad performance. The 3.3.0-6 NSD client node works fine. We have seen the same issue with 3.3.0-4, and it disappeared in GPFS 3.3.0-6.
Is there another setting(besides nfsPrefetchStrategy) we need to change in GPFS 3.4.0 compared to GPFS 3.3.0?
Updated on 2010-11-22T08:19:28Z at 2010-11-22T08:19:28Z by sdesmet
  • dlmcnabb
    dlmcnabb
    1012 Posts

    Re: nfsPrefetchStrategy and GPFS 3.4.0

    ‏2010-11-17T18:30:12Z  
    The "fuzzy sequential" heuristics (which uses the nfsPrefetchStrategy (NPS) setting) were substantially changed in 3.3.0.5 (3.2.1.20+), but a few unintended consequences were fixed in 3.3.0.6. The 3.4 release had all the same changes when it first shipped. (BTW, this applies to any fuzzy sequential file access style, not just NFS.)

    I have no logical explanation of how NPS worked before this since it was really broken.

    The new way NPS is used is for a window of blocks balanced around the "current block". GPFS will prefetch blocks as if it were truly sequential as long as accesses are to blocks within the window, or advances to the next block after the window. The "current block" always moves to the highest block accessed.

    If the setting is 3, then the window is the current block and +1/-1. If you have a small filesystem blocksize relative to the NFS request size and the number of nfsd threads, then you might make it bigger to say 5 or 7. (Even numbers are not symmetric around the current block and are biased towards previous blocks).

    There are no other settings for this, and 3.4 should work the same as 3.3.0.6 or later.
  • sdesmet
    sdesmet
    7 Posts

    Re: nfsPrefetchStrategy and GPFS 3.4.0

    ‏2010-11-18T09:20:11Z  
    • dlmcnabb
    • ‏2010-11-17T18:30:12Z
    The "fuzzy sequential" heuristics (which uses the nfsPrefetchStrategy (NPS) setting) were substantially changed in 3.3.0.5 (3.2.1.20+), but a few unintended consequences were fixed in 3.3.0.6. The 3.4 release had all the same changes when it first shipped. (BTW, this applies to any fuzzy sequential file access style, not just NFS.)

    I have no logical explanation of how NPS worked before this since it was really broken.

    The new way NPS is used is for a window of blocks balanced around the "current block". GPFS will prefetch blocks as if it were truly sequential as long as accesses are to blocks within the window, or advances to the next block after the window. The "current block" always moves to the highest block accessed.

    If the setting is 3, then the window is the current block and +1/-1. If you have a small filesystem blocksize relative to the NFS request size and the number of nfsd threads, then you might make it bigger to say 5 or 7. (Even numbers are not symmetric around the current block and are biased towards previous blocks).

    There are no other settings for this, and 3.4 should work the same as 3.3.0.6 or later.
    I understand the fuzzy sequential heuristics. But as a demonstration of the problem:

    I have 12 4 GB files in one directory, exported over NFS to a Mac OSX client.

    1. read one file, single dd, with a blocksize=GPFS blcoksize(I know NFS will split this up anyway).
    -> performance: 560MB/s
    2. read all 12 files at the same time with dd
    -> performance ~400-500MB/s
    3. stop the 12 reads when in the middle of the file
    4. read all 12 files at the same time, again from the beginning
    -> performance: ~200MB/s
    5. again read one file, single dd
    -> performance: 140MB/s
    6. on a GPFS cluster node 'touch' the file(with the linux touch tool) while reading it(from step 5)
    -> performance rises to the same 560MB/s from the beginning.

    To me this looks like GPFS doesn't see the sequential pattern as long as I don't touch the file.

    If I do the same on a GPFS 3.3.0-6, the performance is always the same. Both NSD client nodes are connected to the same cluster, and run SLES 11-SP1.
  • dlmcnabb
    dlmcnabb
    1012 Posts

    Re: nfsPrefetchStrategy and GPFS 3.4.0

    ‏2010-11-18T16:42:11Z  
    • sdesmet
    • ‏2010-11-18T09:20:11Z
    I understand the fuzzy sequential heuristics. But as a demonstration of the problem:

    I have 12 4 GB files in one directory, exported over NFS to a Mac OSX client.

    1. read one file, single dd, with a blocksize=GPFS blcoksize(I know NFS will split this up anyway).
    -> performance: 560MB/s
    2. read all 12 files at the same time with dd
    -> performance ~400-500MB/s
    3. stop the 12 reads when in the middle of the file
    4. read all 12 files at the same time, again from the beginning
    -> performance: ~200MB/s
    5. again read one file, single dd
    -> performance: 140MB/s
    6. on a GPFS cluster node 'touch' the file(with the linux touch tool) while reading it(from step 5)
    -> performance rises to the same 560MB/s from the beginning.

    To me this looks like GPFS doesn't see the sequential pattern as long as I don't touch the file.

    If I do the same on a GPFS 3.3.0-6, the performance is always the same. Both NSD client nodes are connected to the same cluster, and run SLES 11-SP1.
    To diagnose this we will need traces on the GPFS nodes. Do "mmtrace trace=io" on the GPFS node serving NFS clients. then in the middle of each of the tests do "mmtrace noformat" to cut off the old trace and start a new one. Also get the active prefetch info using:
    (mmfsadm dump instances; mmfsadm dump fs) > /tmp/mmfs/dump.inst.fs.$(date +"%y%m%d.%H.%M.%S").

    When the tests are all done, do: "mmtrace stop formatall". Each trace is a wraparound trace buffer and will only get the last few seconds of activity before the mmtrace command. Get "mmfsadm dump all" when done so that I can see the file inode information and configuration settings.

    Please identify for each trace what the test was doing exactly, and inode numbers of the files being worked on so that any other activity can be ignored.
  • sdesmet
    sdesmet
    7 Posts

    Re: nfsPrefetchStrategy and GPFS 3.4.0

    ‏2010-11-22T08:19:28Z  
    • dlmcnabb
    • ‏2010-11-18T16:42:11Z
    To diagnose this we will need traces on the GPFS nodes. Do "mmtrace trace=io" on the GPFS node serving NFS clients. then in the middle of each of the tests do "mmtrace noformat" to cut off the old trace and start a new one. Also get the active prefetch info using:
    (mmfsadm dump instances; mmfsadm dump fs) > /tmp/mmfs/dump.inst.fs.$(date +"%y%m%d.%H.%M.%S").

    When the tests are all done, do: "mmtrace stop formatall". Each trace is a wraparound trace buffer and will only get the last few seconds of activity before the mmtrace command. Get "mmfsadm dump all" when done so that I can see the file inode information and configuration settings.

    Please identify for each trace what the test was doing exactly, and inode numbers of the files being worked on so that any other activity can be ignored.
    I have run the traces, they are located at:
    http://users.ugent.be/~smdesmet/mmfs.tar.bz2

    The included readme.txt explains the different steps.