First of all: the following blog is about some SAN extension considerations related to Brocade SAN Switches. The described problems may affect other vendors as well but will not be discussed here. It will also not cover all sub topics and consideration but describes a specific problem.
There are a lot of different SAN extensions out there in the field and Brocade supports a considerable proportion of them. You can see them in the Brocade Compatibility Matrix in the "Network Solutions" section. As offsite replication is one of the key items of a good DR solution, I see many environments spread over multiple locations. If the data centers are near enough to avoid slower WAN connections usually multiplexers like CWDM, TDM or DWDM solutions are used to bring several connections on one long distance link.
From a SAN perspective these multiplexers are transparent or non-transparent. Transparent in this context means that:
- They don't appear as a device or switch in the fabric.
- Everything that enters the multiplexer on one site will come out of the (de-)multiplexer on the other site in exactly the same way.
While the first point is true for most of the solutions, the second point is the crux. With "everything" I mean all the traffic. Not only the frames, but also the ordered sets. So it should be really the same. Bit by bit by bit exactly the same. If the multiplexing solution can only guarantee the transfer of the frames it is non-transparent.
So how could that be a problem?
In most cases the long distance connection is an ISL (Inter Switch Link). An ISL does not only transport "user frames" (SCSI over FC frames from actual I/O between an initiator and a target) but also a lot of control primitives (the ordered sets) and administrative communication to maintain the fabric and distribute configuration changes. In addition there are techniques like Virtual Channels or QOS (Quality of service) to minimize the influence of different I/O types and techniques to maintain the link in a good condition like fillwords for synchronization or Credit Recovery. All these techniques rely on a transparent connection between the switches. If you don't have a transparent multiplexer, you have to ensure that these techniques are disabled and of course you can't benefit from their advantages. Problems start when you try to use them but your multiplexer doesn't meet the prerequirements.
What can happen?
Credit Recovery - which allows the switches to exchange information about the used buffer-to-buffer credits and offers the possibility to react on credit loss - cannot work if IDLEs are used as a fillword. They would use several different fillwords (ARB-based ones) to talk about their states. If the multiplexer cuts all the fillwords and just inserts IDLEs at the other site (some TDMs do that) or if the link is configured to use IDLEs, it will start toggeling with most likely disastrous impact for the I/O in the whole fabric.
Another problem appears less obvious. I mentioned Virtual Channels (VC) before. The ISL is logically split. Of course not the fibre itself - the frames still pass it one by one. But the buffer management establishes several VCs. Each of them has its own buffer-to-buffer credits. There are VCs solely used for administrative communication like the VC0 for Class_F (Fabric Class) traffic. Then there several VCs dedicated to "user traffic". Which VC is used by a certain frame is determined by the destination address in its header. A modulo operation calculates the correct VC. The advantage of that is that a slow draining device does not completely block an ISL because no credits are sent back to enable the switch to send the next frame over to the other side. If you have VCs the credits are sent back as "VC_RDY"s. If your multiplexer doesn't suport that (along with ARB fillwords) because it's not transparent, you can't have VCs and "R_RDY"s will be used to send credits. The result: As you have only one big channel there, Class_F and "user frames" (Class_3 & Class_2) will share the same credits and the switches will prioritize Class_F. If you have much traffic anyway or many fabric state changes or even a slow draining device, things will start to become ugly: The both types of traffic will interfer, buffer credits drop to zero, traffic gets stalled, frames will be delayed and then dropped (after 500ms ASIC hold time). Error Recovery will generate more traffic and will have impact on the applications visible as timeouts. Multipath drivers will failover paths, bringing more traffic on other ISLs passing most probably the same multiplexer. => Huge performance degradation, lost paths, access losses, big trouble.
You see, using the wrong (or at least "non-optimal") equipment can lead to severe problems. It's even more provoking the used multiplexer in fact is transparent but the wrong settings are used in the switches. So if you see such problems or other similar issues and you use a multiplexer on the affected paths, check if your multiplexer is transparent (with the matrix linked above) and if you use the correct configuration (refer to the FabOS Admin Guide). And if you have a non-transparent multiplexer and no possibility to get a transparent one, don't hesitate to contact your IBM sales rep and ask him about consultation on how to deal with situations like this (e.g. with traffic shaping / tuning, etc).