Topic
  • 7 replies
  • Latest Post - ‏2012-12-15T22:17:58Z by VincenzoVagnoni
VincenzoVagnoni
VincenzoVagnoni
112 Posts

Pinned topic network congestion issues

‏2012-12-15T10:31:45Z |
in a cluster from time to time we are observing relatively large waiters of this kind
0x2AAAAC10BE60 waiting 2.333560000 seconds, NSDThread: on ThCond 0x2AAAB038EB18 (0x2AAAB038EB18) (InuseCondvar), reason 'waiting for exclusive use of connection for sending msg'

When this happens, we observe a large number of TCP retransmissions from the servers to the clients. We have seen that this happens when the (1GE) NICs of the clients saturate (due to GPFS traffic). There is no saturation instead on the server side (10GE), nor in the intermediate network. So it looks like the problems are simply due to saturation of the client NIC interfaces.

Is there something that we can do to avoid this kind of congestion?
Updated on 2012-12-15T22:17:58Z at 2012-12-15T22:17:58Z by VincenzoVagnoni
  • SystemAdmin
    SystemAdmin
    2092 Posts

    Re: network congestion issues

    ‏2012-12-15T12:48:48Z  
    What is your MaxMBpS setting on those nodes ?

    For a single 1G connection I think it should be no bigger than 100.

    Markus
  • VincenzoVagnoni
    VincenzoVagnoni
    112 Posts

    Re: network congestion issues

    ‏2012-12-15T13:01:54Z  
    What is your MaxMBpS setting on those nodes ?

    For a single 1G connection I think it should be no bigger than 100.

    Markus
    it is at default: maxMBpS 150
  • VincenzoVagnoni
    VincenzoVagnoni
    112 Posts

    Re: network congestion issues

    ‏2012-12-15T13:18:37Z  
    What is your MaxMBpS setting on those nodes ?

    For a single 1G connection I think it should be no bigger than 100.

    Markus
    I lowered maxMBpS to 100, 50, and even 10, without any influence. Other ideas?
  • VincenzoVagnoni
    VincenzoVagnoni
    112 Posts

    Re: network congestion issues

    ‏2012-12-15T15:32:23Z  
    I lowered maxMBpS to 100, 50, and even 10, without any influence. Other ideas?
    I tried to tweak maxMBpS. Whatever value, it is completely ineffective. One detail: the client node mounts the filesystem from a remote cluster. Is this parameter working also when the filesystem is mounted from remote?
  • SystemAdmin
    SystemAdmin
    2092 Posts

    Re: network congestion issues

    ‏2012-12-15T17:22:09Z  
    I tried to tweak maxMBpS. Whatever value, it is completely ineffective. One detail: the client node mounts the filesystem from a remote cluster. Is this parameter working also when the filesystem is mounted from remote?
    I only worried about MaxMBpS the other way until now: It was too low and prevented GPFS from exploiting the hardware capabilities (on 10G and IB networks). As a result my IO performance was lower than what we expected/desired.

    I understand that it should reflect the hardware capability/bandwith available and is used by GPFS to tune its internals for that available bandwidth.

    In you case you may also want to see if you can optimize the setting of the IP stack.
  • FredStockatIBM
    FredStockatIBM
    50 Posts

    Re: network congestion issues

    ‏2012-12-15T22:11:09Z  
    What version of GPFS are you running?
  • VincenzoVagnoni
    VincenzoVagnoni
    112 Posts

    Re: network congestion issues

    ‏2012-12-15T22:17:58Z  
    What version of GPFS are you running?
    3.4.0-17 on top of kernel 2.6.18-274.7.1.el5