From the very beginning of using these blades I have always had a handful of blades that have CRC errors showing up on each of these switches (so both chassis have the same problem). I can tell when this happens because the O/S (XenServer 6.0) show aborts in the kernel message log, and there is a 30 second O/S "hang" on all of the VMs on that blade until the controller aborts the I/O. It is very annoying and users do notice the problem. Usually only one of the two switches for a particular blade will experience the CRC errors, so the workaround is to disable the port in the offending switch. It means that the FC is no longer redundant for that blade and a hardware failure could knock out the FC on the blade completely.
I went through the process of updating the firmware and EDC images on the QMI2582 cards, nailed up the connection speeds to 8Gb on the switch and HBA ports, and ensured that the EDC image selection was for the Brocade switch. I have also updated the firmware on the brocade switches to the latest (Fabric O/S 6.4.2b) and have verified the FPGA versions on them. The combination of updates did settle down some of the CRC errors, but some still persist. I have also managed to eliminate some of the CRC error prone connections by playing shell games with the blades between slots. It is disturbing that this actually works, telling me that there is still some marginal signal integrity issue in the overall scheme of things for which the EDC firmware cannot totally compensate on the QMI2582.
Here is the "scli -i" output from XenServer for one port of the QMI2585 modules:
-------------------------------- Host Name : bceng2-xs8 HBA Instance : 0 HBA Model : QMI2582 HBA Description : QMI2582 QLogic 8Gb Fibre Channel Expansion Card (CIOv) for IBM BladeCenter HBA ID : 0-QMI2582 HBA Alias : HBA Port : 1 Port Alias : Node Name : 20-00-00-24-FF-26-CD-92 Port Name : 21-00-00-24-FF-26-CD-92 Port ID : 03-03-00 Serial Number : LFD1111L54074 Driver Version : 8.03.07.03.55.6-k2 BIOS Version : 2.13 Driver Firmware Version : 5.06.05 (90d5) Flash BIOS Version : 2.13 Flash FCode Version : 3.17 Flash EFI Version : 2.38 Flash Firmware Version : 5.06.03 Actual Connection Mode : Point to Point Actual Data Rate : 8 Gbps PortType (Topology) : NPort Target Count : 6 PCI Bus Number : 36 PCI Device Number : 0 PCIe Max Bus Width : x8 PCIe Max Bus Speed : 5.0 Gbps PCIe Negotiated Width : x4 PCIe Negotiated Speed : 5.0 Gbps HBA Status : Online --------------------------------
I have run out of ideas regarding this and am looking for help to further diagnose and ultimately solve the issue. The support folks are quite happy just changing out QMI2582 modules until something works, but that is not an end solution, it is just a band aid. I am sure that I am not the only experiencing this issue and want to help come up with a real end solution. So folks, please weigh in.