Just got sent 1000's of nmon files! Help!
nagger 100000MRSJ | | Comment (1) | Visits (4744)
A couple of times a month I get a question like this:
My first reaction is: that is not my problem I wrote nmon for AIX and nmon for Linux and not the nmon analyser.
If you stop and think, it is obvious that you are never ever going to load 35 GB in to Excel on a laptop with 8 to 16 GB of memory.
Excel is likely to grind the laptop a halt and then explode at a few 100 MB of data.
First lesson: Don't try to do a capacity planning exercise using performance tuning data.
Note: I will not comment on the guys taking 1000's of nmon data snapshots a day or running nmon for days at a time - they can explode Excel with a single file. They deserve what that asked for :-) While writing this some one want to have lots more than 10,000 data point on a graph displayed on a screen with 1920 pixel across!
Second Lesson: Change to nmonchart.
This ksh script graphs the nmon files much faster than Excel (typically a second or two) and can tackle much larger files. As it is a script you can run it on 100's of files in a directory on AIX or Linux. If its on a POWER Server do loads of them in parallel. So depending on the size of the nmon files, it can process a few thousand files an hour (single stream shell script looping thought files). The viewing is by your browser - one nmon graphs set per browser tab. When viewing the graphs: you can have loads open at one time and flick between them. This scales up to say two dozen LPARs.
Third Lesson: Building a data repository for capacity planing is non-trivial
You could gear up with various tools that take nmon data and then let you graphs LPARs over time:
If you are going to do long term performance and capacity planning this is a good idea. If it is a once only exercise then you are not going to want to find the hardware and spend a couple of weeks setting these tools up.
There are alternative like LPAR2RRD and Ganglia too which don't use nmon files.
Fourth Lesson: Newer tools avoid all this data management
We also have new wave performance tools with a new data collector njmon - this outputs JSON format and a lot more stats than nmon which you can then live inject in to new wave time aware databases and then live dynamic graphing tool like InfluxDB + Grafana, ELK, Splunk. For more info
Assuming its a once only project
So lets assume you are doing a "once only" audit or capacity planning or server consolidation exercise. Not time to get organised and tooled up.
What you need to do is extract a summary from the 100's of nmon files then have those summaries in CSV format so you can load the resulting data in to an excel spread sheet.
I am going to assume you are a real technical person - no Windows tools here! You have access to AIX or Linux and are OK with simple shell scripts using grep and awk.
Let us get organised.
If you have nmon files from different servers then place them in different directories.
You may have noticed nmon file names are very carefully designed: <hos
This means the sort very nicely with the ls -l command for hostname and then time.
Note: I will not comment on the guys deciding their own rubbish file names via a shell script which is often buggy and then blame nmon !!
You can then pick out a set of nmon files for a particular day like today's (23rd November 2018) ls -1 *_20181123_*.nmon
So here are three scripts to extract General information, CPU stats and Memory usage.
In our example we should quiz the IT staff to decide a busy day then focus on that - latter we might explore other days for comparison.
So we have have 6 directors for the 6 servers and roughly 40 LPARs for for 30-ish days.
We want a summary of the LPARs for a specific day. Instead of time based graphs like nmon we need to step back and get basic config then minimum, average, maximum and 95% type stats on the CPU and Memory.
With 240 LPARs in our example we need to cut down on the stats per LPAR to a basic few
Here is what I recommend
I have selected a wild random set of nmon files and changes Serial Numbers and hostnames - I hope your nmonfiles are MUCH more consistent.
The scripts are for AIX only (at the moment) and complain if it is Linux data or if the file is missing the LPAR stats used. Below is the raw output:
Sample nsum_gen output - don't try and read this see below
Sample nsum_cpu output - don't try and read this see below
nmonfile, snapshots, VP, E, VP:E poolCPU, poolIdle, Weight, Capped, total, min, avg, max, 95percentile
Sample nsum_ram output - don't try and read this see below
nmonfile, count, total_used, min_used, avg_used, max_used, 95percentileMB
The scripts are sub-second - with very large files they may take a second or two.
So I run :
echo General Info >nsum.csv
If I had a directory full of many days I would select just one day
echo General Info >nsum.csv
Let us use a spreadsheet to sum() columns up and make it easier to read
Next we save these to a .csv file and open that file in Excel - or your favourite spread sheet.
You may have to tell it to open CSV files.
Note these are a random set of LPARs from different machine - if this was all one server then we can check that with the Serial Number
Well, I hope this lets you quickly analyse vast numbers of nmon file.
Comments are welcome below - especially if I have not explained something well.
All of the above will take about 4 minutes per server, once you have your nmon files grouped sensible into directories.
Download the scripts and samples:
Download by clicking on this link
- - - The End - - -