Comments (4)
  • Add a Comment
  • Edit
  • More Actions v
  • Quarantine this Entry

1 9GWD_Shannon_Moore commented Permalink

Sebastian, <div>&nbsp;</div> I must respectfully disagree with your advice to _never_ change the port fillword setting. Each storage vendor, whether IBM, EMC, Hitachi, etc., has a recommended port fillword setting for Brocade switches for their respective storage subsystems. Customers should always follow the advice of their storage vendor in relation to the port fillword settings. <div>&nbsp;</div> There are still fairly current examples proving that properly setting the port fillword can either <br /> 1) completely eliminate "false" SAN/communication/connectivity issues exhibited in error logs and/or switch logs <br /> or <br /> 2) eliminate "noise" in error logs and in switch logs which would otherwise lead support personnel down incorrect paths for problem determination. <div>&nbsp;</div> Reference: <br /> PMR 24122,227,000 from August 2013, page 116: <br /> "Since correcting the port fillword settings on the Brocade switches, there have been NO path failures on the VIOS or the clients." <div>&nbsp;</div> PMR 16452,227,000 from February 2013 update, starting on page 373 which says: <br /> "Notably _absent_ in the Feb 5 data are fscsi or fcs-reported errors. There are no PLOGI timeouts, target reset timeouts, adapter reset ring entries, etc. THAT means that the port fillword correction has _definitely_ improved communication and connectivity on these paths. Furthermore, the paths ALL recovered within 1 minute, 25 seconds." <div>&nbsp;</div> Regards, <br /> Shannon Moore, <br /> AIX Development Support Specialist <br /> IBM Storage and Technology Group, <br /> AIX I/O Device Driver Technical Team Lead <br /> Austin, Texas

2 seb_ commented Permalink

Hello Shannon, Thank you for your feedback. Unfortunatly I only found one of the cases in the records. But it's interesting, because I often hear such quotes as a "proof" for the theory that a wrong fillword mode was the cause of a problem. That's exactly why I wrote that blog post. As written in the PMR 24122,227,000 the reason for the problems there was a slow drain device. At first it was said to be solved by changing the fillword mode. Later the analyzing engineer himself was not so sure anymore. A slow drain device is one that doesn't give buffer credits back to the switch in a timely manner. While this could have dozens of reasons, I never stumbled across a case where it was provably caused by a wrong fillword setting. (See for some ideas) This is a classic of how a coincidence could be mistaken as a cause. If you change the fillword, you basically bounce the port. The port bouncing itself is usually one of the first actions you do to resolve a slow draining behavior. And from the technical side I don't see any reason why the choice of one fillword over the other would have any impact on the detection of R_RDYs. So I stay with what I wrote in the blog post. Cheers seb

3 9GWD_Shannon_Moore commented Permalink

Sebastian, Thank you for your comments. Regarding your reply to my previous comment, I suppose I should have clarified one thing a bit better... PMR 24122,227,000 actually had a customer where there were multiple issues. The customer perceived them to all be the same, so they were all handled under the same PMR. The slow drain device was actually on a completely separate fabric than the one where the port fillwords were corrected. I was the AIX Development Support Specialist assigned to that PMR and correcting the port fillwords did, in fact, resolve spurious path failures on the VIO servers and VIO clients.

4 seb_ commented Permalink

Hello Shannon, I just saw that I looked in a different PMR pointing to yours, so I might not see the full picture. Being unable to read the original one, I don't see much sense in discussing it further in public without facts. However, I guess we could discuss it via sametime. I would be very surprised to see a reproducible problem due to fillwords other than that a port doesn't come online. Tracing something like that would really clarifying that. I did a lot of FC traces in the lab around christmas time 2011 with a lot of different fillword (and other configuration) settings and I never saw the slightest problem if the actual link initialization worked. If changing the fillword changed something, the probability is much higher that the improvement came alone from the link initialization. If it solved your problem and didn't rather impact the customer's production, I'm happy for you. But still seeing questions and cases coming in where a rash "Change the fillwords of all ports!" caused major issues, does certainly make me not happy. Cheers seb