RFS commented

A couple of questions? <div>&nbsp;</div> 1) Does the run queue count include the currently executing threads or just those waiting for CPU resources? <div>&nbsp;</div> 2) In this example, should I only be concerned if the run queue is &gt; 300 for sustained period?

nagger commented

On AIX and reported by nmon, vmstat, topas and others the run queue is the number of runnable processes that could use CPU cycles right now. Some of these will be on the CPUs and some waiting to get CPU time (assuming there are enough of them). For part 2) For maximum throughput you wan the number to be greater than 300 (or the total number of SMT threads (CPU cores times 4 for SMT=4) also called the logical CPUs) so that if some of the processes on the CPUs hit something they have to wait for like locks, disk I/O network packet responses, memory to be freed then there are other process threads that can use the CPU cycles. More of a worry for me (as a performance speed junky) is a smaller run queue like less than half the SMT threads = badly designed application, not enough user activity or a LPAR that is over configured in CPU resources at could be used better else where. <div>&nbsp;</div> If you use the "ps -el" command and look at the WCHAN column then the processes with a item in this column are the ones that are not runnable and so are not on the run queue as they are waiting for an event like terminal, disk or network I/O or locks or inter-process communication. The WCHAN is called the Wait or Sleep Channel and it actually the AIX Kernel address of the data structure it is waiting for the resource to be available event. When the kernel puts data in the resource it puts the processes waiting on the run queue so they can take actions on the newly arrived data. <div>&nbsp;</div> Hop that helps, Nigel Griffiths

WollyJeeWilikers commented

We are currently running topas_nmon on some AIX 6.1 P5, P6, and P7 servers. Would you be able to tell us which API call is being made behind the scenes of topas_nmon that gets rolled up and creates the CPU_ALL view in the NMON Analyzer? <div>&nbsp;</div> You mention to look at the Physical CPU cores consumed. Our system currently calls the perfstat_cpu_total subroutine, but when we graph this data over time (User, System, Idle), the graph is about half of the total % that the CPU_ALL tab provides. Should we be using the perfstat_cpu subroutine instead? We would ideally like to mimic the CPU_ALL tab stacked CPU graph. <div>&nbsp;</div> Could you also elaborate a bit more on the difference between the CPU_ALL and PCPU_ALL tabs, and why the PCPU tabs don't always get generated when the NMON Analyzer is used on some AIX nodes (currently using 4.2).

nagger commented

Dear WollyJeeWilikers, perf_cpu_total() and the perfstat_cpu_total_t data structure using the puser, psys, pwait and pidle members. Perhaps you forget to divide the results by the elapsed time. You also don't state if this is a shared or dedicated CPU LPAR - maths is different for Shared CPU - here you have to factor in the time the LPAR was not actually running on the CPU. IN this case the user an system time are OK but you have to boost the wait and idle times to allow for missing PURR counter increments. The boost is done in the ratio of the wait and idle, of course. The PCPU and SCPU stats where (in my humble opinion) a confusing mistake and only useful if you have the CPUs in Power saving mode i.e. its changing the CPU GHz to save electrical power. I hope to have them removed or an optional feature in the next AIX release. The developer that added them did not realise the volume of data they cause on larger machines. The maximum would be the new E880 with 192 CPU which would generate 3000+ lines of pointless stats every snap shot.