Almost a year ago I wrote an article about congestion bottlenecks in Brocade switches. I said you should avoid them, because they mean that you probably have no redundancy because of too much workload or you don't use it properly. You can use the bottleneckmon to detect them. On the other hand I cared much more about latency bottlenecks, often caused by slow drain devices and their implications. And so I do today.
Well...stop! Didn't you talk about congestion bottlenecks?
Yes! Today I want to explain how a congestion bottleneck could cause the exact same symptoms on the devices like a latency bottleneck - and exactly the same performance degradations. This is how it happens. In the middle you see a SAN director with 2 portcards and 2 core cards. While the devices are connected to the portcards, the core cards provide the backend connections between them. They are internally connected via the backplane. So for example host 1's way over there to the storage array A would traverse the portcard, then one of the both core cards and leaves the other portcard until it reaches storage array A. It could even be that two devices connected to the same portcard have to go over the core cards, because so called local switching is only done within an ASIC and a portcard could have more than one depending on the number of ports.
Now please meet host 2. Host 2 is a wonderful modern server. One of the work horses of the datacenter. It's fully packed with virtual machines, but its many cores and memory, as well as its state-of-the-art HBA, provide enough horsepower to cope with the workload. This baby is more than capable to do the work and it's in no way a slow drain device. It's zoned and mapped to the storage arrays A, B, C and D and it uses them heavily, mostly for read operations. The green tiny bars are read requests and as you see in the next picture it sends them to all of them, all of the time.
Of course the other hosts send requests, too, but let's focus on our diligent host 2. Yes, the pictures are too simplistic, but I'm sure you'll get the point. On the next one you see the first responses flowing back to host 2. Communicating with several storage arrays the link towards host 2 is used heavily, but host 2 is processing the incoming frames quickly and gives buffer credits back to the switch in proper time. So far so good.
But the more and longer the link utilization is very high, the more likely the following will happen if you enabled bottleneckmon with alerting:
2013/09/07-12:07:11, [AN-1004], 7002, SLOT 7 | FID 128, WARNING, FAB1DOM5, Slot 2, port 14 is a congestion bottleneck. 99.67 percent of last 300 seconds were affected by this condition.
If you didn't enable bottleneckmon, the congestion bottleneck would still be there... you just wouldn't know it.
The crux is: you will hardly find a congestion bottleneck that just flows with high link utilization and no negative effects. The probability is much higher for the following scenario:
Although there are enough buffer credits for this highly utilized link, frames are piling up towards it, because there is just too much workload and the link is busy sending frames. There is no slow drain device and to stay with the bathing metaphor: the drain works very well and transports as much water as its physically able to do. But there is so much more water in the tub that it could not go through the drain at the same time. And in addition imagine you have not only one water tap (in our case storage arrays) but four of them. They fill the tub quicker than the drain can empty it. As a result the internal buffers for all the hops through the SAN director fill up (that's basically the tub) and finally the director needs to do something against that: It will slow down the sending of buffer credits to the devices. Not only devices that want to send frames directly to host 2, but due to back pressure also the ones that send frames into that rough direction (using the same internal connections for example). And finally you'll end up in something like this:
The SAN director just behaves like a slow drain device itself!
Frames pile up inside the storage arrays and other end devices impaired by the slow drain behavior. If their RAS package is good, they will yell about credit starvation and probably even drop frames within their FC adaptors. In extreme situations these frame drops could happen in the director, too. At least you would see then something that would point you to a performance problem. Because otherwise - if you would have substantial delay in the traffic but all the frames get finally transferred to the next internal or external hop within 500ms ASIC hold time - you would only see the congestion bottleneck. And without bottleneckmon you wouldn't see anything at all then. The switch would look clean. Nothing in porterrshow or porstatsshow. Both show only external port counters anyway. As a SAN administrator you would not suspect anything in the director to cause this.
And still it would be there. A big performance problem caused by a device communicating with too much other devices. Not a slow drain device but still causing a slow drain in the SAN.
So how to solve it?
It's basically what I wrote a year ago plus points 3. and 4. from How to deal with slow drain devices. You just have to ensure - from a architectural design point of view - that all components of the SAN are able to cope with the workload at any given time. It's both that easy and that complex. But the first step towards resolving such a situation is to detect it properly and to keep in mind what could happen.