in a cluster from time to time we are observing relatively large waiters of this kind
0x2AAAAC10BE60 waiting 2.333560000 seconds, NSDThread: on ThCond 0x2AAAB038EB18 (0x2AAAB038EB18) (InuseCondvar), reason 'waiting for exclusive use of connection for sending msg'
When this happens, we observe a large number of TCP retransmissions from the servers to the clients. We have seen that this happens when the (1GE) NICs of the clients saturate (due to GPFS traffic). There is no saturation instead on the server side (10GE), nor in the intermediate network. So it looks like the problems are simply due to saturation of the client NIC interfaces.
Is there something that we can do to avoid this kind of congestion?
This topic has been locked.
7 replies Latest Post - 2012-12-15T22:17:58Z by VincenzoVagnoni
Pinned topic network congestion issues
Answered question This question has been answered.
Unanswered question This question has not been answered yet.
Updated on 2012-12-15T22:17:58Z at 2012-12-15T22:17:58Z by VincenzoVagnoni
SystemAdmin 110000D4XK2092 Posts
Re: network congestion issues2012-12-15T15:32:23Z in response to VincenzoVagnoniI tried to tweak maxMBpS. Whatever value, it is completely ineffective. One detail: the client node mounts the filesystem from a remote cluster. Is this parameter working also when the filesystem is mounted from remote?
SystemAdmin 110000D4XK2092 PostsACCEPTED ANSWER
Re: network congestion issues2012-12-15T17:22:09Z in response to VincenzoVagnoniI only worried about MaxMBpS the other way until now: It was too low and prevented GPFS from exploiting the hardware capabilities (on 10G and IB networks). As a result my IO performance was lower than what we expected/desired.
I understand that it should reflect the hardware capability/bandwith available and is used by GPFS to tune its internals for that available bandwidth.
In you case you may also want to see if you can optimize the setting of the IP stack.
FredStockatIBM 120000FABX43 Posts