Comments (8)
  • Add a Comment
  • Edit
  • More Actions v
  • Quarantine this Entry

1 niyazi commented Permalink

Hello Seb, <div>&nbsp;</div> Thank you for the article. Last week I have upgraded my 2 SAN directors to 6.4.2b and now having these warning messages. I have been told to configure bottleneckmon as you mentioned above. I have a question : <div>&nbsp;</div> If I haven't enabled/configured bottleneckmon before is this command enough for recoverying the errors ? <div>&nbsp;</div> bottleneckmon --cfgcredittools -intport -recover onLrOnly <div>&nbsp;</div> <div>&nbsp;</div> Or should I somehow enable bottleneckmon with a command like "bottleneckmon --enable " ? <div>&nbsp;</div> Cheers. <br />

2 seb_ commented Permalink

Hi Niyazi, <br /> I think it's independent from the original bottleneckmon function. I believe that Brocade just needed a vehicle for the credittools without implementing a whole new command, because it was introduced in maintenance levels. (These are the ones with alphabetic characters like v6.4.2a - they usually only contain bugfixes). But I could be wrong. Anyway I think it works without bottleneckmon --enable. <br /> However I still recommend to enable bottleneckmon (bottleneckmon --enable -alert) to detect bottlenecks. <br /> Cheers seb

3 niyazi commented Permalink

Well, today I got an e-mail from IBM support which says I should run the following commands together: <div>&nbsp;</div> bottleneckmon --enable -alert <br /> bottleneckmon --cfgcredittools -intport -recover onLrOnly <div>&nbsp;</div> I will let you know if this helps. <div>&nbsp;</div> Greets !

4 seb_ commented Permalink

Hi Niyazi, <div>&nbsp;</div> I found your case and checked your data. It worked without the "bottleneckmon --enable". You can see that by the [C2-1014] messages. But you had more than one Stuck VC. <br /> I recommend to do statsclear and slotstatsclear now (to clear the counters) and monitor it. And if you still encounter [C2-1012] messages, gather a new supportsave and inform the support team. <div>&nbsp;</div> Cheers seb

5 glush commented Permalink

Hello Seb, <div>&nbsp;</div> May I ask you if regular (not backlink) ISL can loose all the credits in both ways and in which circumstances? I saw the problem which was looking as ISL with no buffers at all (tx c3 drop errors were increasing at ISL ports at both switches, but no other ports were registering c3 tx discards). And the switches were not attempting to reset links to recover bb-credits. FOS v6.4.1b

6 seb_ commented Permalink

Hi Dmitry, <div>&nbsp;</div> If a link lost all its buffer credits it will be resetted after 2s per FC standard. I guess even an internal one. In the article the situation was described when only one or more (but not all) of the VCs of a backlink lost its buffers. <div>&nbsp;</div> Now to your situation. You will see c3 discards due to timeouts as soon as a frame could not be transmitted and had to be kept in the buffer for 500ms because no buffer credit was available to send it to the next receiving port. It does not necessarily mean that the link lost all it's buffers permanently. <br /> Reasons for that could be: a bottleneck / slow drain device on the other side OR way to few buffers assigned to a link OR bit errors ate up buffer credits (by corrupting VC_RDYs/R_RDYs). <br /> From your description (and given that the target switch is capable of reporting tx c3 discards due to timeout) it looks more like the ISL itself is the bottleneck. That does not mean that all the buffers are lost all the time (because that would eventually reset the link) but: <br /> 1) All buffers for a single or more (but not all) VCs of the ISL are lost. <br /> This could be checked by disabling and re-enabling the ISL (if you expect impact in your environment because of that, do it in a maintenance window). <br /> 2) The ISL is not optimal configured. Maybe the distance is longer than expected, LD mode is used, QoS is enabled or the distance setting for your longdistance configuration does not take into account, that the average frame size is usually less than the maximum frame size. <br /> Some of these articles could give you an idea about that: <br /> https://www.ibm.com/developerworks/mydeveloperworks/blogs/sanblog/entry/san_myths_uncovered_2_the_ld_mode_brocade3?lang=en_us <br /> https://www.ibm.com/developerworks/mydeveloperworks/blogs/sanblog/entry/how_to_determine_the_average_frame_size2?lang=en_us <br /> https://www.ibm.com/developerworks/mydeveloperworks/blogs/sanblog/entry/san_myths_uncovered_quality_of_service_in_brocade_fabricos2?lang=en_us <div>&nbsp;</div> Cheers seb

7 Andre Novelli commented Permalink

Hello seb! <div>&nbsp;</div> Is there any log/command that show the usage of the internal ports ? we are facing this error message in onw of our directors but before enable the bottleneckmon, as i'm curious, I would like to look for some more info on this.

8 seb_ commented Permalink

Hi Andre, yes, it's possible, but it doesn't make much sense, because the switch already detected it. I never saw a false positive. For more info please ping me on sametime. Cheers seb