I get this question about twice a month so I thought I would answer it here and refer to this Blog Entry.
This is a non-trivial question as it is complicated.
If you are a "just give me the basics on the back of this envelope" sort of person then do one of the three below:
- If you want the real easy answer hire me for 10,000 €$£ (Euro Dollar Pounds) a day plus expenses and I will do it for you and teach you at the same time via "1 to 1" skills transfer.
- Or phone your local IBM Representative and get our local BP or IBM Lab services involved.
- If you have a critical production server/LPAR/VM with serious performance issues and a paid up support contract raise a Problem Management Report (PMR) - now called a Support Case. Then prepare a snap and perfPMR (for AIX) ready to upload (when asked) and and then confess to support what you just changed and messed up :-) Oh and if you are not at supported level of HMC, System Firmware, AIX or Linux OS level then prepare to work the weekend upgrading.
First check if you have something broken = don't tune the engine if a wheel has fallen off! On AIX use errlog and on Linux use dmseg to see if the OS is reporting problems.
If you want to learn, then you first have to understand how UNIX computers actually work, CPU, CPU cores, CPU core threads, logical + physical + virtual CPUs, memory, memory caches, virtual memory, paging space, disks, adapters, device drivers, networks, C code at a detail level (best to have 5 years C coding experience), kernel knowledge etc.
After all, you can't start tuning a car engine unless you know how all the car parts work together.
Take a courses on AIX Performance Tuning and another on Linux Performance Tuning plus read 2 or 3 large books on each including one on UNIX/Linux Kernel internals.
Read all the manual pages for all the performance tools - nmon/njmon is reporting in a clear way the performance stats that you can get from theses tools ps, vmstat, sar, iostat, lparstat, mpstat, lvmstat, filemon, svmon etc. plus on AIX the o commands (aso, ioo,lvmo, nfso, no, raso, schedo, vmo, ohno - one of these is a joke!).
Run nmon -h and actually read every line - three times - you need to understand and remember it all.
Read the older IBM Redbooks on Tuning, Benchmarking, Databases and Performance and new ones on the internal component of POWER8/POWER9 servers.
Watch all my YouTube videos look for my channel nigelargriffiths roughly 160 videos!
Read all my AIXpert Blogs for the last 4 years - actually all of them.
Get to a few of the IBM Technical University conferences and take the performance tuning sessions from the world wide experts.
If you are get to the European or USA ones you have a good change of meeting me and joining me for one of my own sessions!
Then ... work in a Benchmarking Centre for a couple of years or actually watch many large busy servers running for many weeks
- Can you explain ALL the numbers?
- Which are "out of whack" ?
- Which are out side of the Best Practice settings?
- Ask yourself: What is holding back the server?
All this should take you about 5 to 10 years, if you focus on it.
If I was starting now?
I would look at implementing njmon (note the "j") for stats collection and add the stats to
- InfluxDB + Grafana,
- ELK (the Elastic search tools),
- or other similar live stats databases and browser based graphing engines.
If you have lots of nmon files from lots of LPARs/VMs you have a data management issue and need to avoid Excel. So investigate using nmonchart (really fast at making the graphs and 100% automated) or even nmon2JSON and the tools above for data management and graphing. Plus you can quickly merge in other JSON data sources like my own "nextract" HMC data.
There are other tools that can accept nmon data but njmon captures far more stats and can live streamed to the above tools for live exploring the data.
If you can't find links to the above - then you just failed the Performance Tuning IQ test :-)
- You will also have to develop your sense of humour too!
ps: Below is an example of real-time graphics from njmon -> InfluxDB -> Grafana to give you an idea of what you may be missing!