Topic
  • 10 replies
  • Latest Post - ‏2013-12-05T21:59:24Z by yuri
3CKG_Ramalingam_Ayyamperumal
6 Posts

Pinned topic IO damn slow in GPFS

‏2013-12-03T16:16:52Z |

Hi GPFS Gurus,

I know I am facing the very primitive bottleneck of IO slowness in my GPFS Setup. I am a newbie to GPFS and have installed & configured GPFS with google's help. 

Please find the IO Comparison between Local & GPFS filesystem. I have increased the pagepool size to 1000M. At the moment, I am not even running any application on this. However, this GPFS filesystem is going to hit with Random IO and the performance is really needed from this filesystem.

 

LOCAL FILESYSTEM

================

bash-3.2# time sh -c "dd if=/dev/zero of=ddfile bs=8k count=1000000 && sync"
1000000+0 records in.
1000000+0 records out.
 
real    0m38.349s
user    0m1.409s
sys     0m22.405s
 
 
GPFS FILESYSTEM

================

bash-3.2# time sh -c "dd if=/dev/zero of=ddfile bs=8k count=1000000 && sync"
1000000+0 records in.
1000000+0 records out.
 
real    1m44.338s
user    0m1.245s
sys     0m9.200s

 

GPFS cluster information
========================
  GPFS cluster name:         PROD_EMS.FT
  GPFS cluster id:           771243877078702085
  GPFS UID domain:           PROD_EMS.FT
  Remote shell command:      /usr/bin/ssh
  Remote file copy command:  /usr/bin/scp
 
GPFS cluster configuration servers:
-----------------------------------
  Primary server:    EMS_NODE_1
  Secondary server:  EMS_NODE_2
 
 Node  Daemon node name            IP address       Admin node name             Designation
-----------------------------------------------------------------------------------------------
   1   EMS_NODE_1                  10.X.X.1     EMS_NODE_1                  quorum-manager
   2   EMS_NODE_2                  10.X.Y.1    EMS_NODE_2                  quorum-manager
   3   EMS_NODE_3                  10.X.X.2      EMS_NODE_3                  quorum-manager
flag                value                    description
------------------- ------------------------ -----------------------------------
 -f                 16384                    Minimum fragment size in bytes
 -i                 512                      Inode size in bytes
 -I                 16384                    Indirect block size in bytes
 -m                 1                        Default number of metadata replicas
 -M                 2                        Maximum number of metadata replicas
 -r                 1                        Default number of data replicas
 -R                 2                        Maximum number of data replicas
 -j                 cluster                  Block allocation type
 -D                 nfs4                     File locking semantics in effect
 -k                 all                      ACL semantics in effect
 -n                 32                       Estimated number of nodes that will mount file system
 -B                 524288                   Block size
 -Q                 user;group;fileset       Quotas enforced
                    none                     Default quotas enabled
 --filesetdf        no                       Fileset df enabled?
 -V                 12.10 (3.4.0.7)          File system version
 --create-time      Fri Oct 19 18:17:33 2012 File system creation time
 -u                 yes                      Support for large LUNs?
 -z                 no                       Is DMAPI enabled?
 -L                 4194304                  Logfile size
 -E                 yes                      Exact mtime mount option
 -S                 no                       Suppress atime mount option
 -K                 whenpossible             Strict replica allocation option
 --fastea           yes                      Fast external attributes enabled?
 --inode-limit      133120                   Maximum number of inodes
 -P                 system                   Disk storage pools in file system
 -d                 GBSDISK_1;GBSDISK_2      Disks in file system
 -A                 yes                      Automatic mount option
 -o                 none                     Additional mount options
 -T                 /data/tibco              Default mount point
 --mount-priority   0                        Mount priority
 

Any guidance will be much appreciated

Thanks

Ram

 
  • dlmcnabb
    dlmcnabb
    1012 Posts

    Re: IO damn slow in GPFS

    ‏2013-12-03T16:28:05Z  

    What code release and maintenance level have you installed?

         "mmfsadm dump version | grep Buil"

    What have you set for configuration?

         mmlsconfig

  • 3CKG_Ramalingam_Ayyamperumal
    6 Posts

    Re: IO damn slow in GPFS

    ‏2013-12-03T16:33:41Z  
    • dlmcnabb
    • ‏2013-12-03T16:28:05Z

    What code release and maintenance level have you installed?

         "mmfsadm dump version | grep Buil"

    What have you set for configuration?

         mmlsconfig

    bash-3.2# mmfsadm dump version | grep Buil
    Build branch "3.4.0.11 ".
    Built on Jan 27 2012 at 12:05:31 by .
    bash-3.2# mmlsconfig
    Configuration data for cluster PROD_EMS.FT:
    -------------------------------------------
    myNodeConfigNumber 2
    clusterName PROD_EMS.FT
    clusterId 771243877078702085
    autoload no
    minReleaseLevel 3.4.0.7
    dmapiFileHandleSize 32
    minQuorumNodes 1
    pagepool 1000M
    maxMBpS 3200
    adminMode central
     
    File systems in cluster PROD_EMS.FT:
    ------------------------------------
    /dev/emsfs
    bash-3.2#
     
  • dlmcnabb
    dlmcnabb
    1012 Posts

    Re: IO damn slow in GPFS

    ‏2013-12-03T16:57:11Z  
    bash-3.2# mmfsadm dump version | grep Buil
    Build branch "3.4.0.11 ".
    Built on Jan 27 2012 at 12:05:31 by .
    bash-3.2# mmlsconfig
    Configuration data for cluster PROD_EMS.FT:
    -------------------------------------------
    myNodeConfigNumber 2
    clusterName PROD_EMS.FT
    clusterId 771243877078702085
    autoload no
    minReleaseLevel 3.4.0.7
    dmapiFileHandleSize 32
    minQuorumNodes 1
    pagepool 1000M
    maxMBpS 3200
    adminMode central
     
    File systems in cluster PROD_EMS.FT:
    ------------------------------------
    /dev/emsfs
    bash-3.2#
     

    3.4.0.11 is extremely old with many performance changes since then. You should do rolling upgrade to 3.4.0.25 and since this is Linux, don't forget to rebuild the portabiulity layer.

  • 3CKG_Ramalingam_Ayyamperumal
    6 Posts

    Re: IO damn slow in GPFS

    ‏2013-12-03T16:58:09Z  
    • dlmcnabb
    • ‏2013-12-03T16:28:05Z

    What code release and maintenance level have you installed?

         "mmfsadm dump version | grep Buil"

    What have you set for configuration?

         mmlsconfig

    The Storage behind this setup is SVC --> DS8800. I attached the SVC disk directly and that was the IO benefit I received from local filesystem. As it is going to run a message bus application, I believe this will require Random IO. Though I see lot of recommendations for the RIO, I am not rather confident about my setup

    I have attached 2 disks from the same spool from a single storage.

    bash-3.2# mmlsnsd
     
     File system   Disk name    NSD servers
    ---------------------------------------------------------------------------
     emsfs         GBSDISK_1    (directly attached)
     emsfs         GBSDISK_2    (directly attached)
     (free disk)   NECDISK_1    (directly attached)
     
  • 3CKG_Ramalingam_Ayyamperumal
    6 Posts

    Re: IO damn slow in GPFS

    ‏2013-12-03T17:08:00Z  
    • dlmcnabb
    • ‏2013-12-03T16:57:11Z

    3.4.0.11 is extremely old with many performance changes since then. You should do rolling upgrade to 3.4.0.25 and since this is Linux, don't forget to rebuild the portabiulity layer.

    All Servers are on AIX. And it is serving just a shared filesystem across 2 machines

  • 3CKG_Ramalingam_Ayyamperumal
    6 Posts

    Re: IO damn slow in GPFS

    ‏2013-12-03T17:49:28Z  
    • dlmcnabb
    • ‏2013-12-03T16:57:11Z

    3.4.0.11 is extremely old with many performance changes since then. You should do rolling upgrade to 3.4.0.25 and since this is Linux, don't forget to rebuild the portabiulity layer.

    Sorry the performance went even worse

    bash-3.2# time sh -c "dd if=/dev/zero of=ddfile bs=8k count=1000000 && sync"
    1000000+0 records in.
    1000000+0 records out.
     
    real    2m4.787s
    user    0m1.257s
    sys     0m9.301s
    bash-3.2# cat /dev/null > ddfile
    bash-3.2# pwd
    /data/tibco/ram
    bash-3.2# time sh -c "dd if=/dev/zero of=ddfile bs=8k count=1000000 && sync"
    1000000+0 records in.
    1000000+0 records out.
     
    real    2m10.597s
    user    0m1.227s
    sys     0m9.110s
     
  • HajoEhlers
    HajoEhlers
    253 Posts

    Re: IO damn slow in GPFS

    ‏2013-12-04T17:50:12Z  

    1) You notice that your combined sys/user compared to real is 1:10 ?

    real    1m44.338s
    user    0m1.245s
    sys     0m9.200s

     

    2) that the average stremining speed is around 80 MB/s ?

     

    From this i assume that you might use only a single lun, the storage controller is overloaded or ....

    So from this you should use "nmon" to get a quick view of your performance during the test, also get with with "mmdiag --iohist | sort -n -k6,6 " a view on what GPFS reports on IO time .....

     

    BTW: Check also with  ( Example )

    $  iostat -D -d hdisk1 1 10

    the disk load ....

     

    happy trouble-shooting

    Hajo

     

     

    P.S  A  "dd if=/dev/zero of=ddfile bs=8k count=1000000 && sync" does not creates random IO........

     

     

    Updated on 2013-12-04T17:53:46Z at 2013-12-04T17:53:46Z by HajoEhlers
  • yuri
    yuri
    282 Posts

    Re: IO damn slow in GPFS

    ‏2013-12-04T23:37:12Z  

    If you want to test random IO performance, "dd if=/dev/zero of=ddfile bs=8k count=1000000 && sync" isn't it.  This isn't random, and isn't small record (8k writes will be coalesced in buffer cache).  With a local fs, you'll have entire RAM available as buffer cache, while with GPFS it'll be limited to the configured pagepool space.  You're probably not interested in the caching efficiency though, but rather want to know the steady-state performance level.

    To exercise the actual random 8k record IO, you need a more advanced tool than 'dd'.  gpfsperf.c (available under /usr/lpp/mmfs/samples) can do this, but there are many other tools.  (Ir)regardless of what tool you use, be sure to use the dataset size that exceeds the RAM/disk controller cache size by at least an order of magnitude, to compensate for caching.  Note that it's critical to know how your app does its writes, in particular whether it uses O_SYNC or O_DIRECT -- those have very different performance profiles from regular buffered writes.

    yuri

  • 3CKG_Ramalingam_Ayyamperumal
    6 Posts

    Re: IO damn slow in GPFS

    ‏2013-12-05T19:28:52Z  
    • yuri
    • ‏2013-12-04T23:37:12Z

    If you want to test random IO performance, "dd if=/dev/zero of=ddfile bs=8k count=1000000 && sync" isn't it.  This isn't random, and isn't small record (8k writes will be coalesced in buffer cache).  With a local fs, you'll have entire RAM available as buffer cache, while with GPFS it'll be limited to the configured pagepool space.  You're probably not interested in the caching efficiency though, but rather want to know the steady-state performance level.

    To exercise the actual random 8k record IO, you need a more advanced tool than 'dd'.  gpfsperf.c (available under /usr/lpp/mmfs/samples) can do this, but there are many other tools.  (Ir)regardless of what tool you use, be sure to use the dataset size that exceeds the RAM/disk controller cache size by at least an order of magnitude, to compensate for caching.  Note that it's critical to know how your app does its writes, in particular whether it uses O_SYNC or O_DIRECT -- those have very different performance profiles from regular buffered writes.

    yuri

    Thanks Yuri & Hajo,

    I used the gpfsperf and received the following output.  This environment has to be used for Informatica GRID Architechture

    bash-3.2# /usr/lpp/mmfs/samples/perf/gpfsperf write seq /gpfs/gogo -r 256k -n                                                                                 1024000000
    /usr/lpp/mmfs/samples/perf/gpfsperf write seq /gpfs/gogo
      recSize 256K nBytes 999936K fileSize 999936K
      nProcesses 1 nThreadsPerProcess 1
      file cache flushed before test
      not using data shipping
      not using direct I/O
      offsets accessed will cycle through the same file segment
      not using shared memory buffer
      not releasing byte-range token after open
      no fsync at end of test
        Data rate was 75339.10 Kbytes/sec, thread utilization 1.000
    bash-3.2# /usr/lpp/mmfs/samples/perf/gpfsperf write seq /gpfs/gogo -r 256k -n 1024000000
    /usr/lpp/mmfs/samples/perf/gpfsperf write seq /gpfs/gogo
      recSize 256K nBytes 999936K fileSize 999936K
      nProcesses 1 nThreadsPerProcess 1
      file cache flushed before test
      not using data shipping
      not using direct I/O
      offsets accessed will cycle through the same file segment
      not using shared memory buffer
      not releasing byte-range token after open
      no fsync at end of test
        Data rate was 68167.37 Kbytes/sec, thread utilization 1.000
     

    Sample output during such run

    ========================

    Disks:        % tm_act     Kbps      tps    Kb_read   Kb_wrtn
    hdisk0          28.1     4958.8      19.4          0     10240
     
    tty:      tin         tout    avg-cpu: % user % sys % idle % iowait physc % entc
              0.0        147.7                4.0  52.5   36.9      6.5   0.4  417.1
     
    Disks:        % tm_act     Kbps      tps    Kb_read   Kb_wrtn
    hdisk0          35.5     9740.2      38.0          0     19456
     
    tty:      tin         tout    avg-cpu: % user % sys % idle % iowait physc % entc
              0.0        147.5                3.5  54.1   35.9      6.5   0.5  463.9
     
    Disks:        % tm_act     Kbps      tps    Kb_read   Kb_wrtn
    hdisk0          53.0     12288.0      48.0          0     24576
     

    Create seq

    bash-3.2# ./gpfsperf create seq /gpfs/gogo -r 256k -n 1024000000
    ./gpfsperf create seq /gpfs/gogo
      recSize 256K nBytes 999936K fileSize 1000000K
      nProcesses 1 nThreadsPerProcess 1
      file cache flushed before test
      not using data shipping
      not using direct I/O
      offsets accessed will cycle through the same file segment
      not using shared memory buffer
      not releasing byte-range token after open
      no fsync at end of test
        Data rate was 91784.40 Kbytes/sec, thread utilization 0.999
     

    Read seq

    bash-3.2# /usr/lpp/mmfs/samples/perf/gpfsperf read seq /gpfs/gogo -r 256k -n 1024000000
    /usr/lpp/mmfs/samples/perf/gpfsperf read seq /gpfs/gogo
      recSize 256K nBytes 999936K fileSize 999936K
      nProcesses 1 nThreadsPerProcess 1
      file cache flushed before test
      not using data shipping
      not using direct I/O
      offsets accessed will cycle through the same file segment
      not using shared memory buffer
      not releasing byte-range token after open
        Data rate was 75419.54 Kbytes/sec, thread utilization 1.000
     
    System configuration: lcpu=4 drives=5 ent=0.10 paths=20 vdisks=0
     
    tty:      tin         tout    avg-cpu: % user % sys % idle % iowait physc % entc
              0.0         34.0                4.9  56.9   26.7     11.4   0.5  457.2
     
    Disks:        % tm_act     Kbps      tps    Kb_read   Kb_wrtn
    hdisk0          96.4     12784.0      49.9      25600         0
     
    tty:      tin         tout    avg-cpu: % user % sys % idle % iowait physc % entc
              0.0        148.0                5.0  56.5   28.2     10.2   0.5  487.5
     
    Disks:        % tm_act     Kbps      tps    Kb_read   Kb_wrtn
    hdisk0          93.0     12544.0      49.0      25088         0
     

     

  • yuri
    yuri
    282 Posts

    Re: IO damn slow in GPFS

    ‏2013-12-05T21:59:24Z  

    Thanks Yuri & Hajo,

    I used the gpfsperf and received the following output.  This environment has to be used for Informatica GRID Architechture

    bash-3.2# /usr/lpp/mmfs/samples/perf/gpfsperf write seq /gpfs/gogo -r 256k -n                                                                                 1024000000
    /usr/lpp/mmfs/samples/perf/gpfsperf write seq /gpfs/gogo
      recSize 256K nBytes 999936K fileSize 999936K
      nProcesses 1 nThreadsPerProcess 1
      file cache flushed before test
      not using data shipping
      not using direct I/O
      offsets accessed will cycle through the same file segment
      not using shared memory buffer
      not releasing byte-range token after open
      no fsync at end of test
        Data rate was 75339.10 Kbytes/sec, thread utilization 1.000
    bash-3.2# /usr/lpp/mmfs/samples/perf/gpfsperf write seq /gpfs/gogo -r 256k -n 1024000000
    /usr/lpp/mmfs/samples/perf/gpfsperf write seq /gpfs/gogo
      recSize 256K nBytes 999936K fileSize 999936K
      nProcesses 1 nThreadsPerProcess 1
      file cache flushed before test
      not using data shipping
      not using direct I/O
      offsets accessed will cycle through the same file segment
      not using shared memory buffer
      not releasing byte-range token after open
      no fsync at end of test
        Data rate was 68167.37 Kbytes/sec, thread utilization 1.000
     

    Sample output during such run

    ========================

    Disks:        % tm_act     Kbps      tps    Kb_read   Kb_wrtn
    hdisk0          28.1     4958.8      19.4          0     10240
     
    tty:      tin         tout    avg-cpu: % user % sys % idle % iowait physc % entc
              0.0        147.7                4.0  52.5   36.9      6.5   0.4  417.1
     
    Disks:        % tm_act     Kbps      tps    Kb_read   Kb_wrtn
    hdisk0          35.5     9740.2      38.0          0     19456
     
    tty:      tin         tout    avg-cpu: % user % sys % idle % iowait physc % entc
              0.0        147.5                3.5  54.1   35.9      6.5   0.5  463.9
     
    Disks:        % tm_act     Kbps      tps    Kb_read   Kb_wrtn
    hdisk0          53.0     12288.0      48.0          0     24576
     

    Create seq

    bash-3.2# ./gpfsperf create seq /gpfs/gogo -r 256k -n 1024000000
    ./gpfsperf create seq /gpfs/gogo
      recSize 256K nBytes 999936K fileSize 1000000K
      nProcesses 1 nThreadsPerProcess 1
      file cache flushed before test
      not using data shipping
      not using direct I/O
      offsets accessed will cycle through the same file segment
      not using shared memory buffer
      not releasing byte-range token after open
      no fsync at end of test
        Data rate was 91784.40 Kbytes/sec, thread utilization 0.999
     

    Read seq

    bash-3.2# /usr/lpp/mmfs/samples/perf/gpfsperf read seq /gpfs/gogo -r 256k -n 1024000000
    /usr/lpp/mmfs/samples/perf/gpfsperf read seq /gpfs/gogo
      recSize 256K nBytes 999936K fileSize 999936K
      nProcesses 1 nThreadsPerProcess 1
      file cache flushed before test
      not using data shipping
      not using direct I/O
      offsets accessed will cycle through the same file segment
      not using shared memory buffer
      not releasing byte-range token after open
        Data rate was 75419.54 Kbytes/sec, thread utilization 1.000
     
    System configuration: lcpu=4 drives=5 ent=0.10 paths=20 vdisks=0
     
    tty:      tin         tout    avg-cpu: % user % sys % idle % iowait physc % entc
              0.0         34.0                4.9  56.9   26.7     11.4   0.5  457.2
     
    Disks:        % tm_act     Kbps      tps    Kb_read   Kb_wrtn
    hdisk0          96.4     12784.0      49.9      25600         0
     
    tty:      tin         tout    avg-cpu: % user % sys % idle % iowait physc % entc
              0.0        148.0                5.0  56.5   28.2     10.2   0.5  487.5
     
    Disks:        % tm_act     Kbps      tps    Kb_read   Kb_wrtn
    hdisk0          93.0     12544.0      49.0      25088         0
     

     

    I thought you were interested in small random IO performance?  In that case you want something like "gpfsperf read rand -r 8k", not "read seq -r 256k".  For small random IO workloads, the metric that's usually considered to be the most important is not bandwidth, but rather IOPS (reads or writes per sec).

    In any event, once you eliminate caching artefacts, the random IO rate is ultimately controlled by the disk seek time (unless you dataset is so small that it fits comfortably in some caching layer).  There are only so many ways to improve on this.  Basically, you need more physical disks, and/or faster disks (SSD or higher-RPM conventional drives).

    yuri