IBM Support

nmon for Linux v16 - New Stats, On-screen Facelift & more

How To


Summary

Never ever use an nmon before version 16f - here are some of the new features you will want from nmon 16+.

Objective

Nigels Banner

 

Environment

LInux on Power, AMD64, x86_64, ARM (Raspberry Pi), Mainframe (Z) and any other HW running a modern Linux.

Steps

Over the 2016 Xmas break I have been busy improving nmon and this is a summary with pictures:

1 New Performance Statistics


GPU

First for the OpenPOWER people using the Nvidia GPUs within the S822LC (Firestone) machines we have the Graphical Processor Units and a previous AIXpert Blog covered the details but below is a little reminder:
 

GPUs

These are the stats on-screen

  1. GPU MHz
  2. CPU and Memory Utilisation
  3. Temperature in C
  4. Electrical power in Watts

This is also saved to the nmon output file.

 


MHz

Below we now noticed the CPU cores in a single machine or VM can be running at different MHz ratings so we captures them On-screen and in the nmon file. On-screen is was tricky to decide how to display this so there are 3 modes on-screen - repeated hitting M (capital m for MHz) steps you through them in tern.  This is the CPU core threads list:

MHz for a logical CPUs

Note this machine is very busy and most CPU core threads are at 3491 MHz (roughly 3.5 GHz) but threads 145 to 152 are idling (this makes up one SMT=8 CPU core) at 2061 MHz (2 GHz).

Below there are just the CPU cores and now there are more cores at the slower MHz 2061.


MHz by Cores

Then a final "M" and we get the same CPU core numbers but a graph format - so it is easy to spot fast and slow CPU cores.
 

Char Graph highlighting the high and low MHz

Above obvious only 2 CPU cores are at the lower MHz

 This POWER machine is sold at 2.92 GHz
When in Power Saving mode that drops to 2.0 GHz give the electrical power use is the square of the frequency that means the CPU are using half the electricity = a significant saving in electricity costs and cooling costs. When busy the CPU cores Over-clock to 3.5 GHz ~ 20% at no extra cost.
 This is what the CPU Core Threads graph via nmonchart looks like:
 

MHz stacked

OK that does look a little confusing as its a stacked chart of the MHz for all CPU core threads. The main problem is that it reported for all 80 threads and that it only ever has two values 2061 MHz and 3491 MHz. As this is the way it is reported but Linux in the /proc/cpuinfo file.  An unstacked graph would not work as all the lines would be on top of each other. I may decide to only save the CPU core values as currently all the threads on a particular core have the frequency. In this case the 80 threads would drop to just the 10 POWER8 cores i.e. more managable. The problem then is that on Intel / AMD CPUs the hyperthreading is very difficult to calculate and many of the starts are reported clearly wrong.  Let me now in the comments what you think.

  


Linux Utilisations categories

Classis UNIX has three categories plus Idle for CPU Utilisation:

  1. User = running programs / applications /  commands
  2. System = the UNIX Kernel running system calls or Interrupts
  3. Wait for I/O = This is the same as Idle but there is disk I/O outstanding (it show a disk bottleneck but the CPUs are actually Idle waiting for data)
  4. Idle = the CPU has nothing to do

Linux over the years has added some more and currently has ten categories for CPU Utilisation as follows
 

  1. User = running programs / applications /  commands (like UNIX above)
  2. User Nice = As User but the processes have had their priority dropped using the nice command or system call
  3. System = the Linux Kernel running system calls but not Interrupts (see below Irq and Softirq)
  4. Idle  = the CPU has nothing to do
  5. IO Wait = This is the same as Idle but there is disk I/O outstanding (it show a disk bottleneck but the CPUs are actually Idle waiting for data)
  6. Irq = Kernel system time Handing Interrupt. I have seen imbedded CPU Linux stats where the CPU is very largely interrupt driven so this is useful to highlight that.
  7. Softirq = Software Interrupt - these are like Interrupts but not drive from hardware event but by special CPU instruction to get CPU interrupted for the Kernel to take actions.
  8. Steal = Only applies to a Virtual Machine and highlights that the VM has runnable processes that can't run because the CPU is currently running a different VM. This VM effectively thinks some one else stole the CPU cycles. Over committed CPUs in a highly virtualised environment can get a lot of this. This stats highlights the issue.
  9. Guest - Only applies to a VM Hosting  operating system like KVM or PowerKVM hypervisor environments - this CPU time is used running one or more of the hosted guest virtual machines.  To the hosting OS this is like User time - but within the VM it could be User, System or the other categories.
  10. Guest Nice = The same as the about Guest category above but the hosts is low priority.
As far as I can see you will never get all of these stats one single Linux OS - like you may get steal - or - Guest utilisation numbers non-zero but never on the same machines - the OS is either a guest or a host.  For years nmon has hidden the growing number and only displayed the original 4 but some of these are now genuinely useful!!  So they are now on-screen or saved to the nmon file in addition to the UNIX view.  On-screen you get:
 

Util all 10 of them

Above the top line is the whole machine and below it the many CPU core thread stats.
It is done this way as my POWER machine  I have 80 CPU core thread which are too many to show on the screen and the whole machine number disappear off the bottom but are probably the most interesting.  Note there is 100% per CPU core thread - this could be removed by dividing by the numbers of CPU core threads but that results in lots of information going missing.
 
  1. a 100% tells you a CPU core thread is busy - if we user 100 / 80 = 1.2% it looks like noise
  2. having a number like 3400% informs you there is roughly 34 CPU cores worth of CPU time in used. If reported as 3400 /80 = 42% then you have to do the backward maths to get to 34 CPU core threads.

The Initial nmonchart graphs of this look like this below:

server Util graph

I hope to capture other machines with busy steal and guest numbers.

 2 On-screen Facelift

nmon for AIX has some challenges that we don't have with nmon for Linux when it comes to curses colour handling so I have been slowly adding colour to the on-screen panels to aid readability as we can group information by colour and highlight columns or rows.
The Welcome on-screen panel has the new logo:
 

splash panel on nmon start up

The Help panel looks like this in green:

help in green

Note on the above the colour combinations we have to play with.
The CPU graphs have been colour for a few years as below:
 

colour CPU graphs

Now other data is coloured too
Resources:
 

resources in colour

Memory

memory in colour

Kernel stats:

Kernel in colour

There are loads more - see the bottom link to all the current screen shots.

NEW Wide CPU View

When a 20 CPU core POWER8 Scale-out "low end" machine with SMT=8 presents 160 CPU core threads we have to rethink how data is displayed on the screen because not many people can show 160 lines of text that is readable.So we have a new wide view across the screen mini graphs layout for up to 192 CPU core threads.  The is display using C (uppercase c):
 

Big machine with lots of CPU Core threads

Above you can quickly determine your spare capacity and how Linux is spreading work across threads and cores.
3- Other Improvements
Improvement in nmon 
  1. "lscpu" output is captured to the nmon file and also displays on screen
  2. internal code change to organise the command line option and key to change screen content
  3. nmon -h output rewritten - still rather cryptic but much better and assumes less.
  4. Kernel stats on screen new format
  5. Still working on formatting the boot time from second since the epoch to date time.  The Uptime value overflows far to quickly now.
  6. Less time drift by allowing for nmon compute time for long running data capture to file session.
  7. New Top Process functions to handle order by I/O of the processes (type: 4) (you have to be root for this to work due to a /proc quirk)
  8. Reduced screen artefacts with better cursors pad use.
  9. Automatic generation of User Defined Disk Groups using the lsblk command) to remove reporting of duplicated disk stats (/proc has both disk and disk partition stats mixed together).  Use the option "-g auto" to enable for on-screen and capture to file.  For the on-screen panel type: g
  10. Capture to file: If you use -g auto and the -D you also get more disk stats: disk wait time & disk service times, merges, in-flight I/O count & backlog
  11. JFS stats are more accurate and they know about the hidden block saved for root user - if your disks are large this can waste a lot of space!

 

More information:

  1. nmon for Linux Download binaries
    http://nmon.sourceforge.net/pmwiki.php?n=Site.Download
  2. nmon for Linux Download source code and compiling info
    ​​​​​​​http://nmon.sourceforge.net/pmwiki.php?n=Site.CompilingNmon
  3. nmon for Linux Screen shots and comments
    ​​​​​​​http://nmon.sourceforge.net/pmwiki.php?n=Site.ScreenShots
  4. nmon for Linux Help
    http://nmon.sourceforge.net/pmwiki.php?n=Site.NmonHelp
  5. nmonchart
    http://nmon.sourceforge.net/pmwiki.php?n=Site.Nmonchart
    ​​​​​​​
    ">

    Additional Information


    If you find errors or have question, email me: 

    • Subject: nmon16
    • E-mail: n a g @ u k . i b m . c o m  

    Also find me on

    • Twitter @mr_nmon
    • LinkedIn www.linkedin.com/in/nigelargriffiths
    • YouTube https://www.youtube.com/nigelargriffiths
       

    Document Location

    Worldwide

    [{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SWG10","label":"AIX"},"Component":"","Platform":[{"code":"PF002","label":"AIX"}],"Version":"All Versions","Edition":"","Line of Business":{"code":"LOB08","label":"Cognitive Systems"}},{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"HW1W1","label":"Power ->PowerLinux"},"Component":"","Platform":[{"code":"PF016","label":"Linux"}],"Version":"All Versions","Edition":"","Line of Business":{"code":"","label":""}},{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SWG60","label":"IBM i"},"Component":"","Platform":[{"code":"PF012","label":"IBM i"}],"Version":"All Versions","Edition":"","Line of Business":{"code":"LOB57","label":"Power"}}]

    Document Information

    Modified date:
    26 November 2019

    UID

    ibm11115841