Topic
7 replies Latest Post - ‏2012-12-15T22:17:58Z by VincenzoVagnoni
VincenzoVagnoni
VincenzoVagnoni
108 Posts
ACCEPTED ANSWER

Pinned topic network congestion issues

‏2012-12-15T10:31:45Z |
in a cluster from time to time we are observing relatively large waiters of this kind
0x2AAAAC10BE60 waiting 2.333560000 seconds, NSDThread: on ThCond 0x2AAAB038EB18 (0x2AAAB038EB18) (InuseCondvar), reason 'waiting for exclusive use of connection for sending msg'

When this happens, we observe a large number of TCP retransmissions from the servers to the clients. We have seen that this happens when the (1GE) NICs of the clients saturate (due to GPFS traffic). There is no saturation instead on the server side (10GE), nor in the intermediate network. So it looks like the problems are simply due to saturation of the client NIC interfaces.

Is there something that we can do to avoid this kind of congestion?
Updated on 2012-12-15T22:17:58Z at 2012-12-15T22:17:58Z by VincenzoVagnoni
  • SystemAdmin
    SystemAdmin
    2092 Posts
    ACCEPTED ANSWER

    Re: network congestion issues

    ‏2012-12-15T12:48:48Z  in response to VincenzoVagnoni
    What is your MaxMBpS setting on those nodes ?

    For a single 1G connection I think it should be no bigger than 100.

    Markus
    • VincenzoVagnoni
      VincenzoVagnoni
      108 Posts
      ACCEPTED ANSWER

      Re: network congestion issues

      ‏2012-12-15T13:01:54Z  in response to SystemAdmin
      it is at default: maxMBpS 150
    • VincenzoVagnoni
      VincenzoVagnoni
      108 Posts
      ACCEPTED ANSWER

      Re: network congestion issues

      ‏2012-12-15T13:18:37Z  in response to SystemAdmin
      I lowered maxMBpS to 100, 50, and even 10, without any influence. Other ideas?
      • VincenzoVagnoni
        VincenzoVagnoni
        108 Posts
        ACCEPTED ANSWER

        Re: network congestion issues

        ‏2012-12-15T15:32:23Z  in response to VincenzoVagnoni
        I tried to tweak maxMBpS. Whatever value, it is completely ineffective. One detail: the client node mounts the filesystem from a remote cluster. Is this parameter working also when the filesystem is mounted from remote?
        • SystemAdmin
          SystemAdmin
          2092 Posts
          ACCEPTED ANSWER

          Re: network congestion issues

          ‏2012-12-15T17:22:09Z  in response to VincenzoVagnoni
          I only worried about MaxMBpS the other way until now: It was too low and prevented GPFS from exploiting the hardware capabilities (on 10G and IB networks). As a result my IO performance was lower than what we expected/desired.

          I understand that it should reflect the hardware capability/bandwith available and is used by GPFS to tune its internals for that available bandwidth.

          In you case you may also want to see if you can optimize the setting of the IP stack.
  • FredStockatIBM
    FredStockatIBM
    43 Posts
    ACCEPTED ANSWER

    Re: network congestion issues

    ‏2012-12-15T22:11:09Z  in response to VincenzoVagnoni
    What version of GPFS are you running?