IBM®
Skip to main content
    Country/region [select]      Terms of use
 
 
    
     Home      Products      Services & solutions      Support & downloads      My account     
 
developerworks > My developerWorks >  Dashboard > AIX > ... > nmon > nmon12 Line Items
developerWorks
Log In   View a printable version of the current page.
Overview Connect Spaces Forums Wikis
nmon12 Line Items
Added by nagger, last edited by nagger on Mar 30, 2008  (view change)
Labels: 
(None)

Number Title Status Type Details
1 Remove CPU00 DONE Bug A few annoying bugs like CPU00 sheets removed.
2 SPLPAR wait% scale up DONE in beta 11 Fix Fine tuning the system% & wait% on SMT and/or shared CPUs to match the fudged numbers from AIX tools - when NOT busy
3 Disk Service Times DONE New Save to file, Service and Queue times. These will be analysers further in the next analyser version. Online lots more numbers but not saved to file. Online this is hidden in the Disk Stats hit "D" three times. For file capture use -d and then find the DISKSERV and DISKWAIT data lines
4 Time Drift DONE New Time Drift due to nmon taking time to collect the stats on large machines. This only works for file capture mode. Extreme delays when nmon can't get any CPU time at impossible. If nmon's run time is less that the sleep time then the sleeps time is shorten to keep the next capture time correct. It the nmon's time time is less than a second it adds up until it is a whole second and then its used to shorten the sleep time.
5 Multiple Page Size Stats DONE New Alternative page size stats. Online use "M" - warning this gets you 40 lines of output and the re are four possible page sizes (currently) 4KB, 64KB, 16MB and 16GB. File capture use -M and you get the MEMPAGES data lines one per active page size (MEMPAGES4K, MEMPAGES64K, MEMPAGES16M, MEMPAGES16G)
6 HDLM from Hitachi Rejected - use User Defined Disk Groups - Display Disk stats by Logical device (dlmfdrv) - this is unlikely to happen for two reasons a) I don't have a HDS disk system to test against and b) this requires some details of how the paths match the actual LUNs and this is not available as far as I know as an API. However, you can always use User Defined Disk Groups to achieve the same thing manually or via a simple script.
7 Netpacket size to file DONE New netpacket size saved to the capture file. These can be derived from read and write KB/s divided by the transfers/s. Unfortunately this is a hard calculation given the nmon data. See the NETSIZE data lines.
8 Timestamps in UTC DONE New Record all timestamps in UTC (rather than local time), making daylight saving time shifts and multiple time zones easier, i.e. less ambiguous recording time. Use -G option and you get a line AAA,gmt to highlight this.
9 Record Filesystem mounts Rejected - Use external Data Collector script - Optionally record filesystem mount points as they change - this will cause mayhem in graphing tools. I suggest an external data collector add-on to monitor the filesystems at each collection point and some post processing to clean up the uses that the number and order of filesystem are changing during the collection via a post processing shell script and then merged back into the nmon data. See the nmon docs on how to get nmon to kick off the script to make sure it is synchronised.
10 Process control Rejected - too dangerous - Pause the auto update & allows cursor movement up/down through the process list - Permits management commands listed: Process kill, nice or renice. This will never happen. It is far too dangerous. If you kill the wrong process you will blame nmon and nmon gets a bad reputation.
11 Longer command line data Rejected - Use external Data Collector script - Command details (would give you the complete "ps auxw" command line and arguments. This is already available or the most of it is - hit u or -U for user arguments. I can't help it if your applications (particularly Java) needs dozens of lines of options to run. You could use an external data collector to dump the fully command names into a file and then at the end sort and uniq the list to educe the size.
12 Process Tree Rejected - use ptree for AIX - Process tree to trace back and down to show who created task and what tasks it created. Nice idea but it is already available with the ptree command. There are various versions around including shell versions.
13 Process watch list Rejected - Currently Implemented - (w)atch to keep on list regardless of activity (ignore . directive). Not interactive but check out the -C command-list option.
14 More Kernel and Hypervisor Stats DONE New Include ALL kernel stats in the file output (including interrupts), there is more in online mode including phantom interrupts & "number involuntary and voluntary virtual CPU context switches" for SPLPAR. These are saved on the RAWLPAR the libperfstat partitions_t structure) and RAWCPUTOTAL (the libperfstat cpu_total_t) file output sheets/section. These are dumped "as is". You will have to see the AIX documentation for meaning - /usr/include/libperfstat.h is a start. New option -K to get these.
15 Very high process numbers (like 40,000) problems Keep as separate version - Special nmon for 40,000+ processes trying to reduce CPU cycles to a) extract the data from AIX, b) sort the processes and c) runs out of memory at 25MB. Do other people run at these levels of processes? Already have a special version for this but the 256MB limit is a good sanity check. Decided to keep this as a special version as it removes some useful data from nmon to reduce the per process data structures fetched from the kernel
16 High Priority nmon CPU DONE New nmon to raise its priority run on over worked systems.  CPU priority implemented using the -Z nice_value option. Raising priority only works for root. Not included is the lock itself in memory to ensure it is not paged out - due to potentially bad side effects.
17 AIX 5.2 ML3&4 work around Rejected keep two AIX 5.2 versions - Libperfstat problems on AIX 5.2 ML 3 and ML4 with backward compatibility being broken. But for disk services times we will need an nmon version especially for early and later ML levels. I have a code fix for this but it will slow down nmon, As ML9 is out, no one should be using ML3 and 4 anyway (unless you know differently!!)
18 Number of digits in Timestamps DONE New T0000 to T9999 is not enough for some users. More digit for the (IMHO) "barking mad" long term second by second data capture - 300 to 600 snapshots is a sensible limit for good graphs. Now the new -w N allows you to set this yourself. Allowed is 4 to 16. Eight digits give you three years at once a second. Be careful this does not effect the nmon data analysers/filters etc. Also outputs, for example, "AAA,timestampsize,8" for post-processing filter tools to work this out. The default is still 4 digits.
19 Adapter "not available" DONE New Work around the Adapter "not available" feature/bug in AIX. This libperfstat bug can cause nmon to stop on some AIX 5.4 ML 4 or 5 - libperfstat says there are three disk adapters and when then asked to the three data structures returns invalid argument. nmon works around these bugs by then assuming the disk adapters present are zero, you get no stats but at least it continues to run. Sorry about these AIX bugs.
20 VIOS SEA Stats DONE in beta 24 New Directly capture Virtual I/O Server Shared Ethernet Adapter Stats - this can only be done by running entstat currently so its fairly expensive in CPU terms. Already have an external data feed for this. See -O or hit O option (capital letter o).
21 Warnings from nmon DONE New nmon to produce better warnings in the data file for bad data and working around bugs etc. For example, network overflow errors produce in the nmon output file lines like: "ERROR,T089, Network byte count overflow 5"
22 TOP Online - sizes in KB, MB, GB DONE in beta 13 New This would be to limit the width so column alignment still works. Default is KB but M or G added if the number is too large to fit = 5 digit if under 120 characters wide xterm and 8 digits if 120 or over width. Needs some testing - please!
23 Partition Mobility (POWER6) & Application (WPAR) Mobility (AIX6) DONE New nmon handles Dynamic Resource (DR) changes - displayed on the online Shared Partition details. This can be useful for demonstrations, DR phases and reason, old and new serial number and when (already handled the Partition name). PM means a change of serial number and possibly LPAR name. AM means a possible change on serial number and LPAR name.  DR events are also saved in the BBBR Sheet - note these are not synchronised with nmon output as there can be many DR events between nmon captures.
24 POWER6 Dedicated Donating DONE in beta 27 New Added Donating stats to the LPAR screen (hit p) and a new DONATE sheet to the output file - if LPAR Donating this is automatically output. Not too complex once I found the libperfstat documentation sample code errors!! The CPUs are most donated by default and pulled back only when the Donating LPAR wants them - a good learning point.
25 POWER6 Multiple CPU pools Postponed - Not currently available - will have to wait for later release.
26 nmon 12 beta flaky online refresh DONE in beta 12 Bug Sorry about that, now fixed with improved algorithm.
27 Kernel online stats global read/write counters not MBs DONE in beta 12 Bug Dumb coding error - code fixed.
28 Docs for Disk Service data You are joking right! - Volunteers welcome. The header files have a few clues but the AIX documentation adds very little more about the precise meaning of the variables. Someone will have to contact AIX Support for better answers.
29 Remove trailing commas in network sections DONE in beta 13 Bug Simple change, well done for spotting and reporting it.
30 Accurate CPU stats DONE in beta 14 Bug CPU and physical CPU stats are more accurate with a better algorithm.
31 CPU online histogram DONE in beta 14 Bug Fixed stray characters at the end of line
32 Online memory stats DONE in beta 14 Bug On 100GB+ machines the memory stats are now reported in GBs.
33 Show VG stats DONE in beta 14 Bug Fixed a bug where the wrong stats are displayed.
34 Large Time Drift DONE in beta 14 Bug Added solution for the case where, nmon taking longer than the sleep seconds to run. Hopefully, this would be temporary during a peak but it should recover from this now and stop time drift once over the peak.
35 Multiple Page Sizes in MB DONE in beta 14 New Online M shows the multiple memory page size counters - now hit M again and you get them in Megabytes - this helps you work not having to do the maths i.e. is 4Kx10000 more or less than 64K*2560 etc.
36 AAA,time format DONE in beta 14 Fix Now in hours:minutes:seconds (no longer hours:minutes.seconds). This fixes a batch mode Analyser problem BUT post processing filters MAKE A NOTE AND TEST THIS AND REPORT BACK. I know this will cause problems with nmon2rrd!
37 Disk online 'o' option DONE in beta 14 Bug Fixed missing key and disk numbering is better
38 Fast abort in capture mode DONE in beta 14 New If you want to kill nmon with SIGUSR1 or 2 during the config documentation phase at nmon start up then these signals will now stop nmon immediately (previously they were ignored until first capture is completed). This can help benchmarkers clean up after a false start.
39 Return an error on invalid options DONE in beta 18 New To help scripting nmon returns an error code for invalid options. This can be checked simply with in ksh scripts and $?
40 Service times for Disk groups - renamed to Selecting Particular Disks
DONE in beta 20 New Some use disk groups online to monitor specific single disks and would like the service times. Do others use disk groups like this? Added feature but to the regular online disk functions. Option: -k to limit the disks to particular names
This was a user request -k hdisk3:hdisk2:hdisk5 then only these appear on the disk list - this is interactive only and up to 64 disks.
41 CPU_ALL cpu count change Done in beta 20 Fix The CPU count is either the physical CPUs or for Shared CPU LPARs the Physical CPU used. This will change to Physical CPU, which for shared CPU LPARs is the Virtual Processor count.
42 Folded CPU count Done in beta 28 New To show CPU folding for SPLPARs. The simplest way is counting the CPUs with zero system calls but you have to count pairs of logical CPUs for SMT enabled LPARs. Folding where virtual CPUs are not scheduled processes for efficiency and thus releasing the whole CPU for other uses. Online the CPUs will show Folded=NN on the bottom line. Capture to file added Folded as the last column of the LPAR stats.
43 hdiskpower -> power DONE in beta 25 New Online the EMC hdiskpowerNNN disks are shown as powerNNN. Otherwise you don't see the numbers as the field is only 8 characters
44 JFS output causes analyser problems DONE in beta 33 New On systems with many many filesystem (in the hundreds) and with very long mount point directory names, we find Excel has problems with the very large lines output by nmon. Now you can remove the JFS section with the -J option.
45 NFS problem on AIX5.3ML6+ DONE in beta 38 New nmon works around bugs in libperfstat now. Also don't capture nfso and nfsstat data to file unless -N because if NFS is off this can hang until timed out.
46 nmon within a WPAR DONE in beta 38 New For AIX6 and Workload partitions - nmon runs inside a WPAR but had to learn a few tricks like a WPAR has no disks, adapters or paging spaces.
47 NFS v4 DONE in beta 52 New NFS v4 has completely new set of stats and these are included now. Online hit N twice. To capture use -NN (yes two N's).
48 Disk Stats Resets DONE in beta 52 New Some AIX command and third party tools reset the disk stats. nmon detects this and reports the errors and remove the problem of massive disk numbers that were reported.
49 WPAR Stats DONE in beta 54 New Workload Partitions are new for AIX6, nmon gives you WPAR CPU and Memory Use. Online gives you more data line numperm, Run queue etc. but only the basic two are saved to file. See -@ or online hit '@'.
50 Fibre Channel Stats DONE in beta 54 New The fcstat command gives us more details than any API. This tool can be used to gather real FC stats not just the faked disks stats added up numbers.  For example, Fibre Channel tape use. I changed my mind and added it to nmon12 - this is absolutely the last functional addition for nmon12. See -~ or online hit '~".
51 CPU Long Term graphs DONE in 12b New The online Long Term graphs were based on average Logical CPU numbers, in the case of SMT=on and only used half the CPUs due to not enough processes this can be misleading (reports 50% used but you could be using nearly all the CPU power). Changed now to average Physical CPU Numbers including the AIX command style "adjustments to wait & idle". This can still be misleading for Uncapped SPLARs as it gets to near 100% as you get to Entitlement but that is the way AIX commands report it. You should monitor Physical CPU use with SPLAR and not Utilisation.
52 Multiple CPU pools DONE in 12d New Stats include the pool of the LPAR and the default pool. In the data file look for POOLS sheet and online the bottom of the LPAR section hit"p" - assumes you have the Pool Monitoring Authorisation
53 Online KB to MB DONE in 12d New Network online more digits + network & disk "=" KB<-->MB
54 AIX fcstat hang work around DONE in 12e New Time out if the AIX fcstat hangs then disable fcstats - this hang can happen if the cables are no plugged in!
55 libperfstat bug work around DONE in 12e New real_process/real_user are approximation but when they go wrong they go VERY wrong - these fixes output 0.0 instead
56 Add LPAR/WPAR id DONE in 12e New Add LPAR/WPAR identification hints to help work out the machine the LPAR or WPAR is running on/in. See the Welcome or Resource "r" online for details or for file capture find AAA,SerialNumber,XXXXX and AAA,LPARNumberName,N,LLLLLLL and AAA,MachineType,IBM,TTTTTTTT and AAA,NodeName,NNNNNN


 
    About IBM Privacy Contact