Comments (3)
  • Add a Comment
  • Edit
  • More Actions v
  • Quarantine this Entry

1 6X87_Graham_Harris commented Permalink

Martin, quite a while back, I asked the support centre for clarity regarding delay samples in RMF3, and how they were derived. This was particularly with regard to heavily multi-threaded tasks, such as Websphere &amp; CICS. I was told that every work element in a multi-threaded address space was examined, and if ANY ONE of those threads was experiencing a delay, then it would count that as being a delay for that address space for that sample interval (split by processor type of course). While I guess that may seem blindingly obvious, it wasnt to us at the time! It strikes me that as the delay samples are reported at the address space level (in RMF3), but an address space can have many hundreds of in-flight threads, the likelihood of multi-threaded tasks being reported as being delayed is becoming almost inevitable when handled in this manner, which when compared to a single threaded task (where a delay value can perhaps actually indicate something meaningful?) gives a bit of an imprecise indication these days. Would be nice to have a bit of 'inside information' as to whether this is anticipated being addressed in the future, especially with z/OS1.12 seeming to start looking at Work Elements (for the in-ready reporting, anyway!). <br /> BTW, I'm not sure if the RMF1 fields get around this by setting off the delay samples at a thread level, against a proper total sample count encompassing all threads in each task, so the above may only be a RMF3 "feature"... <br />

2 MartinPacker commented Permalink

Hello Graham. I think we have to think of this at the Performance Block (PB) level - for Mon I. So in that sense the "ANY ONE of those threads" comment is true in that it upticks the count (against that particular PB). I don't think, though, this can be taken to mean the whole address space is delayed. So the denominator in this calculation will be samples across ALL the PBs. Not just one every 1/4 second. (I'm not sure "thread" is quite synonymous with "PB" but for most purposes they're pretty close.) <div>&nbsp;</div> So, it seems Mon III is out of line with Mon I. (Perhaps I'm less attuned to Mon III than you are as I don't actually use it.) :-) <div>&nbsp;</div> <div>&nbsp;</div> I'm not sure what you're referring to in R.12, by the way. <div>&nbsp;</div> One of the things I want to do Real Soon Now :-) is start looking at PB populations - after all MaxTasks / MXT / whatever in CICS is the same number. (So a neat trick would be to present a customer / client with a calculated MXT.)

3 6X87_Graham_Harris commented Permalink

Yes, I take the point about PB's sort of equating to threads. Although, does that mean that every thread in a websphere task (which can add up to hundreds) has a PB associated with it? Or is it just the 'worker threads'/enclaves that have PBs? <br /> Likewise for other multi-threaded tasks, does each thread creation inherently create a PB under the covers, or do these have to be intentionally created in some way? (a quick search actually seems to indicate a PB does indeed have to be specifically created with a WLM service - IWMX2CRE?) <br /> If PBs therefore do relate to just a subset of the threads in a task, then I cant see how that would really encompass all the possible 'dispatchable threads', all of which are "eligible to suffer" CPU delay! <br /> I had the impression that the 'work unit' related to a dispatchable unit of work, which to me, kinda means 'thread'.....but I've not seen a good description of this....yet! <div>&nbsp;</div> BTW, I am a very big user of D OMVS,PID= for webspheres, as you can get such a good 'internal view' of the websphere thread layout, and its pretty useful to get to know whats "normal", so you can more easily recognise 'abnormal' conditions, especially when new applications are being performance tested. Thats how I know there can be hundreds of individual dispatchable threads in websphere. <br /> (And as an aside, it would just be sooooooo goooood to have something similar to D OMVS,PID [MVS command] to display a traditional TCB tree for something like CICS - I know various monitor tools can do this, but would be so cool to have it as an MVS command) <div>&nbsp;</div> But....getting back to the original discussion.... it does seem there is a bit of a different approach in reporting delays between Mon3 &amp; Mon1. Sadly. <div>&nbsp;</div> The 1.12 thing I was on about was with regard to the in ready queue distribution being based on 'work unit' (not element - sorry!) rather than address spaces, in the 1.12 RMF CPU report: <br /> (so, given the above diatribe, is a "work unit" a PB?; a thread?; a ?)!!!!! <div>&nbsp;</div> http://publib.boulder.ibm.com/infocenter/zos/v1r12/index.jsp?topic=%2Fcom.ibm.zos.r12.erbb500%2Ferbzraa0236.htm <div>&nbsp;</div>