Virtual I/O Server Performance Monitoring Tools
Introduction
The availability of monitoring tools for the VIOS is improving steadily but there is no single repository that explains all the options. This page is designed to let you know what is possible and, being a Wiki, can be kept up to date very easily. Hopefully, this will save you much wasted effort. If it does, please provide feedback in the comments section. If you find an error please use the feedback to tell us and others.
This Wiki page assumes you are running the latest Virtual I/O Server, which at the time of writing in (2006) is 1.3. If you want Performance Monitoring you are going to have to update the Virtual I/O Server to the latest release. This is the only way features and functions are added. It is also assumed you have the root user access.
Warning: As VIOS 2.1 is now available, some of this page is out of date for example the topas command has many new options.
Below there are three groups of tools:
- Pre-installed Tools - on the VIO Server
- External Monitoring - via daemons
- Stealth Tools - workable but not supported by UNIX Support
Pre-installed Tools - on the VIO Server
AIX and Virtual I/O Server command - topas
This is the standard AIX simple screen performance monitor that has been available for a number of years. You have to log on to the Virtual I/O Server via telnet (so no managers then) and it is curses based. You only get to see the statistics of the Virtual I/O Server LPAR and there is no global view of the machine.
Potential issues:
- Only dumb screen curses based so you have to login to the Virtual I/O Server
- No data recording (but see below topasout section)
- Zero Linux on POWER support - OK not that important for the Virtual I/O Server but stops this being the generally used tool and can catch you out.
- Fine small machines/LPAR but annoying on large configurations with lots of disks, adapters, networks, processes etc. again only an issue on very large Virtual I/O Servers.
| Sample Screen Captures |
|
AIX command topas -C and Virtual I/O Server command topas -cecdisp
This is an extension to the standard topas tool described above to provide a Cross LPAR view.
You can run this on any LPAR in the machine and see the Virtual I/O Server's performance stats or run it on the Virtual I/O Server itself.
If run from the:
- Virtual I/O Server the command is: topas -cecdisp
- An LPAR running AIX the command is: topas -C
As shown below you get a global view of the machine (one line per LPAR). Because topas -C has to get the data from each LPAR you must provide a network connection to each LPAR you wish to monitor. These can, of course, be virtual LANs and this make the VIOS an ideal place in which to run topas -C
Potential issues:
- No disk or network stats
- Curses based so you have to login to the LPAR or the Virtual I/O Server
- Requires the use of a separate tool for data recording (see topasout section below)
- No support for Linux on POWER LPARs
| Sample Screen Captures |
|
| Key |
- Shr=Shared CPU Section
- Ded=Dedicated CPU Section
- Mon=Monitoried CPUs in LPARs that have been communicated with
- InUse=memory not on free lists
- PhysB=Shared or Dedicated Physical Busy (User%+System%)
- Lp=LogicalProcessors
- Me=Memory in GB
- Us/Sy/Wa/Id = User/System/Wait/Idle CPU Utilisation percentages
- PhysB=Physical CPU Busy
- Ent=Entitlement
- %EntC=Percentage of Enttitlement Capacity Used
- Vcsw/PhI=largely pointless
- M = Mode
- c=capped C=capped with SMT on
- S=unCapped S=unCapped with SMT on
|
topasout
Introduced in 5.3 TL 4 and Virtual I/O Server 1.3
- Uses xmwlm daemon and automaticaly started from inittab
- Initially kept 2 days worth of data, but changing to 7 days in 5.3 TL5
- recordings include most of topas data - except process and WLM data
New (5.3 TL5) topas -R option records topas -C metrics (CEC-wide data)
- Works independently and in parallel from topas real-time monitors
- Must be turned on manually in one of the partitions in CEC
- via configuration script which adds line in inittab, use the command: /usr/lpp/perfagent/config_topas.sh add
topasout
- Post processing tool for recordings number of different reports:
- WLE reports
- text reports (5.3 TL5)
- include both local data and CEC-wide data
- options include detailed and summary
- spreadsheet and csv formats
- nmon_analyzer format (5.3 TL5)
The below diagram show how this all hangs together.
- On the left hand side is the local reporting of the individual AIX LPARs including Virtual I/o Servers. The xmwlm daemon collects the data to files in /etc/perf/daily. The topasout command can be used to selectively report on this data on the local LPAR.
- On the right hand side is the CEC also known as the Cross-Partition monitoring. For real-time online viewing of the data a command: topas -C or on the Virtual I/O Server the command: topas -cecdisp. For capturing this data the topas -R option is used. This saves data to files in /etc/perf. Then later but not to long later (as there is a limited time to this data), the topasout can be used to selectively report on the Cross-Partition level.

On my very latest AIX release the topasout has new options. The command "oslevel -s" returns "5300-05-04" which I think means AIX 5.3 Maintainance Level (or Technology Level) 05 and Service Pack 04. It is worth reading the README file:
view /usr/lpp/perfagent/README.perfagent.tools
The topasout command has the following options (topasout -?):
topasout [-c|-s|-a|-R daily|-R weekly] [-R detailed|-R summary|-R disk |-R lan [ -i MM -b HHMM -e HHMM]] [xmwlm_recording|topas_recording]
flags: -c comma separated output format
-s spreadsheet import format
-R daily | weekly WLE output report
-R detailed | summary | disk | lan (local recordings)
-R detailed | summary (topas recordings)
-i MM split the recording reports into equal size time periods.Allowed Values (in minutes) are 5, 10, 15, 30,60
-b HHMM begin time in hours (HH) and minutes (MM).Range is between 0000 and 2400]
-e HHMM end time in hours (HH) and minutes (MM).Range is between 0000 and 2400 and is greater than the begin time]
-a nmon analyzer format style;
For using nmon analyzer with topasout then please refer
/usr/lpp/perfagent/README.perfagent.tools for more help
I then ran as root the following two commands to ensure we are capturing the Cross-Partition and local LPAR performance data.
- /usr/lpp/perfagent/config_topas.sh add
- /usr/lpp/perfagent/config_aixwle.sh add
These shell scripts do a number of things including adding entries in the /etc/inittab file for running the daemons and adding entries into the crontab for user "adm" - see /var/spool/cron/crontabs/adm for daily and weekly reports or read the script for details.
Wait a few minutes to let the data collecters start up and then go looking for the data files:
- look in /etc/perf/daily and you should find a fresh (recently updated) file like xmwlm.061219 (I did this on 2006 (06), December (12) the 19th (19).
- look in /etc/perf to find a file like topas_cec.061219 - again the same date format as above.
These files should grow a little every five minutes. Of course, there is not point is creating the reports immediately as there will only be a few data points. So wait at least an hour before trying the below or better yet a whole day.
Assuming you have collected a few hours data the topasout command can be used. The command itself is confusing because it can take as input a number of different files, report on different stats and detail levels and produce different output formats. Also note that at the moment the AIX documentation does not cover the current version of topasout (19th December 2006). Also very confusing is the documentation does not say where the output is saved or the filename.
Workload Estimater (WLE) reports:
- The running of the commands and reports is a complete mystery and as far as I can tell not documented at all and thus makes them completely unusable
- A colleague tells me that these reports are created daily and weekly (so you have to wait at least a week for one) and are created automatically from the crontab entries for adm user.
- If you check the crontab the commands are:
- /usr/bin/topasout -R daily
- /usr/bin/topasout -R weekly
- Which is very confusing as the topasout command is extremely ambiguous
- From this we deduce the -R daily or -R weekly takes no file name arguments and you are not really meant to run these yourself.
- This is completely undocumented !!
- The -R weekly seems to create the /etc/perf/hostname_aixwle_daily.xml file
- This is completely undocumented !!
- Also note the filename of the output is reused - if you don't save the report it is over written - not cleaver!!
- /etc/perf/p5900_aixwle_weekly.xml - my machine hostname is "p5900". I found this file as the result of the -R weekly run but the file is tiny, we have to wait for the end of the first week. As the report is run on Sunday morning you may have to wait for the second Sunday.
- This is completely undocumented !!
- There is no /etc/perf/hostname_aixwle_daily.xml or perhaps /etc/perf/daily/hostname_aixwle_daily.xml or similar file, so we have to guess that there is no daily WLE file.
This .xml file (assuming we get one with more than the bare bones after a week) should be imported into the Workload Estimator tool (WLE) which you can find at http://www-912.ibm.com/supporthome.nsf/document/16533356
where you can find the download but note you will also need Apache Tomcat Server v4.1.31 and Java Development Tool Kit (JDK) v1.4+
- Until we get some real captured data this can't be tested further.
- The -R daily seems to create the CEC raw data file, for example, topas_cec.061228
- This is completely undocumented !!
- Remember these WLM reports are just for the local LPAR - but I suspect the WLE tool will let you add all the LPAR data files together to find the whole use of the machine but there is no Linux on POWER LPAR stats.
Also note that /usr/lpp/perfagent/daily.cf and /usr/lpp/perfagent/wlm.cf files seem to include the number of days worth of data that is retained - see the "retain 2 1" line. This may let us save more days worth of data but it is completely undocumented !!
Local reports about this LPAR or the Virtual I/O Server:
- First note that the bulk of the topasout options do not apply as the syntax for the topas command in the documentation is extremely confusing and ambiguous!
- You have only the following options: topasout [\-c] xmwlm_recording
- I ran the following commend: topasout -c /etc/perf/daily/xmwlm.061219
- The result is nothing at all it reported and no file in the current directory.
- Amazingly silly as it seems the report is then found in the file /etc/perf/daily/xmwlm.061219_01
- This is not documented.
- If you rerun the command it silently overwrites the same file, even if you use a different format option like -s or -a!
- Next you have to guess the -R disk and -R lan are also LPAR or Virtual I/O Server stats which is completely the opposite of what you might expect!! as the Syntax groups the options the wrong way.
- As below, the report is output directly to standard out.
- Example, output from: topasout -R lan xmwlm.061227
#Report: System LAN Summary --- hostname: p5900 version:1.1
Start:12/27/06 00:03:34 Stop:12/27/06 23:57:55 Int: 5 Min Range: 1434 Min
Mem: 1.0 GB Dedicated SMT: ON Logical CPUs: 4
Time InU PhysB MBPS MB-I MB-O Rcvdrp Xmtdrp
00:08:34 0.4 0.2 0.0 0.0 0.0 0 0
00:13:34 0.4 0.2 0.0 0.0 0.0 0 0
00:18:35 0.4 0.1 0.0 0.0 0.0 0 0
00:23:35 0.4 0.2 0.0 0.0 0.0 0 0
00:28:35 0.4 0.2 0.0 0.0 0.0 0 0
00:33:35 0.4 0.1 0.0 0.0 0.0 0 0
00:38:36 0.4 0.1 0.0 0.0 0.0 0 0
00:43:36 0.4 0.1 0.0 0.0 0.0 0 0
00:48:36 0.4 0.2 0.0 0.0 0.0 0 0
00:53:36 0.4 0.2 0.0 0.0 0.0 0 0
00:58:37 0.4 0.1 0.0 0.0 0.0 0 0
01:03:37 0.4 0.1 0.0 0.0 0.0 0 0
01:08:37 0.4 0.2 0.0 0.0 0.0 0 0
01:13:37 0.4 0.2 0.0 0.0 0.0 0 0
01:18:38 0.4 0.1 0.0 0.0 0.0 0 0
01:23:38 0.4 0.1 0.0 0.0 0.0 0 0
...
- Example, output from: topasout -R disk xmwlm.061227
Report: Total Disk I/O Summary --- hostname: p5900 version:1.1
Start:12/27/06 00:03:34 Stop:12/27/06 23:57:55 Int: 5 Min Range:1434 Min
Mem: 1.0 GB Dedicated SMT: ON Logical CPUs: 4
Time InU PhysB MBPS TPS MB-R MB-W
00:08:33 0.4 0.2 0.0 0.3 0.0 0.0
00:13:33 0.4 0.2 0.0 0.3 0.0 0.0
00:18:33 0.4 0.1 0.0 0.2 0.0 0.0
00:23:34 0.4 0.2 0.0 0.2 0.0 0.0
00:28:34 0.4 0.2 0.0 0.2 0.0 0.0
00:33:34 0.4 0.1 0.0 0.3 0.0 0.0
00:38:34 0.4 0.1 0.0 0.3 0.0 0.0
00:43:35 0.4 0.1 0.0 0.2 0.0 0.0
00:48:35 0.4 0.2 0.0 0.3 0.0 0.0
00:53:35 0.4 0.2 0.0 0.3 0.0 0.0
00:58:35 0.4 0.1 0.0 0.2 0.0 0.0
01:03:36 0.4 0.1 0.0 0.2 0.0 0.0
01:08:36 0.4 0.2 0.0 0.3 0.0 0.0
01:13:36 0.4 0.2 0.0 0.2 0.0 0.0
01:18:36 0.4 0.1 0.0 0.3 0.0 0.0
...
- Example, output from: topasout -a lan xmwlm.070111
CPU_ALL,CPU Total ,User%,Sys%,Wait%,Idle%,CPUs,
CPU03,CPU Total ,User%,Sys%,Wait%,Idle%,
CPU02,CPU Total ,User%,Sys%,Wait%,Idle%,
CPU01,CPU Total ,User%,Sys%,Wait%,Idle%,
CPU00,CPU Total ,User%,Sys%,Wait%,Idle%,
DISKBUSY,Disk %Busy ,hdisk1,hdisk0,
DISKREAD,Disk Read kb/s ,hdisk1,hdisk0,
DISKWRITE,Disk Write kb/s ,hdisk1,hdisk0,
DISKXFER,Disk transfers per second ,hdisk1,hdisk0,
DISKSERV,Disk Avg service time/transfer ,hdisk1,hdisk0,
DISKWAIT,Average wait queue time for read/write transfers,hdisk1,hdisk0,
...
CPU_ALL,T0002,0.21,0.90,0.00,98.88,4.00,
CPU03,T0002,0.00,32.94,0.00,67.06,
CPU02,T0002,0.00,25.63,0.00,74.37,
CPU01,T0002,0.00,4.43,0.00,95.57,
CPU00,T0002,17.98,65.24,0.01,16.76,
DISKBUSY,T0002,0.00,0.00,
DISKREAD,T0002,0.00,0.00,
DISKWRITE,T0002,1.76,0.13,
- Note is you nmon style report does not look like this it will never get imported into the nmon analyser. If it starts with ""#Monitor: xmtrend recording — hostname: p5900" ValueType: mean" you are missing the vital APAR. You may have to upgrade your AIX to at least AIX 5.3 TL5 service pack 1 plus APAR IY87993. I have only tried service pack 4 and it works at this release.
Cross-Partition reports:
- Again you have limited but different options: topasout [-R detailed|-R summary|-R disk |-R lan [ -i MM -b HHMM -e HHMM]] topas_recording
- Assuming you want the whole day and want all the data possible use: topasout topas_recording
- Or the same as above but just a summary use: topasout -R summary topas_recording
- I ran the following details (the default report) command: topasout /etc/perf/topas_cec.061219
- Yes, you guessed it, the output file is in the file /etc/perf/topas_cec.061219_01
- I ran the following summary command: topasout -R summary /etc/perf/topas_cec.061219
- No, not as expected, the output is to standard out!! and looks like this:
#Report: CEC Summary --- hostname: p5900 version:1.1
Start:12/19/06 11:48:30 Stop:12/19/06 14:05:30 Int: 5 Min Range: 137 Min
Partition Mon: 2 UnM: 0 Shr: 1 Ded: 1 Cap: 1 UnC: 1
-CEC------ -Processors------------------------- -Memory (GB)------------
Time ShrB DedB Mon UnM Avl UnA Shr Ded PSz APP Mon UnM Avl UnA InU
11:53 0.16 0.00 2.2 0.0 0.0 0 0.2 2 4.0 3.8 2.0 0.0 0.0 0.0 0.0
11:58 0.28 0.00 2.2 0.0 0.0 0 0.2 2 4.0 3.7 2.0 0.0 0.0 0.0 0.0
12:03 0.00 0.00 2.0 0.0 0.0 0 0.0 2 0.0 0.0 1.0 0.0 0.0 0.0 0.0
12:08 0.02 0.00 2.2 0.0 0.0 0 0.2 2 4.0 4.0 2.0 0.0 0.0 0.0 0.0
12:13 0.00 0.00 2.2 0.0 0.0 0 0.2 2 4.0 4.0 2.0 0.0 0.0 0.0 0.0
12:18 0.00 0.00 2.2 0.0 0.0 0 0.2 2 4.0 4.0 2.0 0.0 0.0 0.0 0.0
12:23 0.00 0.00 2.2 0.0 0.0 0 0.2 2 4.0 4.0 2.0 0.0 0.0 0.0 0.0
12:28 0.00 0.00 2.2 0.0 0.0 0 0.2 2 4.0 4.0 2.0 0.0 0.0 0.0 0.0
12:33 0.00 0.00 2.2 0.0 0.0 0 0.2 2 4.0 4.0 2.0 0.0 0.0 0.0 0.0
12:38 0.03 0.00 5.2 0.0 0.0 0 1.2 4 4.0 4.0 5.0 0.0 0.0 0.0 0.0
12:43 0.04 1.07 5.2 0.0 0.0 0 1.2 4 4.0 2.9 5.0 0.0 0.0 0.0 0.0
12:48 0.01 1.25 5.2 0.0 0.0 0 1.2 4 4.0 2.7 5.0 0.0 0.0 0.0 0.0
12:53 0.01 1.38 5.2 0.0 0.0 0 1.2 4 4.0 2.6 5.0 0.0 0.0 0.0 0.0
12:58 0.02 0.28 5.2 0.0 0.0 0 1.2 4 4.0 3.7 5.0 0.0 0.0 0.0 0.0
13:03 0.02 0.00 5.2 0.0 0.0 0 1.2 4 4.0 4.0 5.0 0.0 0.0 0.0 0.0
13:08 0.02 0.00 5.2 0.0 0.0 0 1.2 4 4.0 4.0 5.0 0.0 0.0 0.0 0.0
13:13 0.02 0.02 5.2 0.0 0.0 0 1.2 4 4.0 4.0 5.0 0.0 0.0 0.0 0.0
...
Here are the files that I managed to produce:
So How usable are these files? I tried with Excel 2000 on Windows XP which is a pretty common combination.
First all the files have some weird character at the start of the first line and then a comment which says the data is from xmtrend (which is not true or shows the coded history).
| Source |
Generator |
Format |
Data file |
Comments |
| Local |
xmwlm |
Comma
Separated
Values
(CSV) |
xmwlm_topasout_c.csv |
This is a complete and utter waste of time. Excel can be used to read the file but it ends up as comments and the data is impossible to graph. Without serious tools to reformat this data, say a couple of hours with grep, sort, awk - or - just run nmon!! |
| Local |
xmwlm |
Spreadsheet |
xmwlm_topasout_s.csv |
This data can be importing in to Excel with a few hints on the data format. However the header lines get confused so they are scramble and misaligned. It may be possible to manually fix this and then draw graphs. At least the data is in sensible columns. Unfortunately, even other line is blank what ruins all graphs, oh and the date time format does not work at all! |
| Local |
xmwlm |
nmon Analyser |
xmwlm.topasout_a_v2.csv |
With AIX 5.3 TL5 Service pack 4 with APAR IY87993 - this starts working properly. The resulting file can be imported into the nmon Analyers (I was using nmon Analyser version 3.2.3). It is claimed that this will also work with AIX TL5 SP1 but I have not tested that service pack. Here is the sample Excel spread sheet - it is pretty boring as the CPUs are not used much xmwlm.070111.nmon.xls this was created with Excel 2002 |
| Local |
xmwlm |
Disk
Spreadsheet |
xmwlm_topasout_R_lan.csv |
Can be loaded into a Spreadsheet By selecting space as a delimiter and multiple separators as one it can be imported or loaded with -> Data -> Get External data -> Import Text File |
| Local |
xmwlm |
LAN
Spreadsheet |
xmwlm_topasout_R_disk.csv |
Can be loaded into a Spreadsheet By selecting space as a delimiter and multiple separators as one it can be imported or loaded with -> Data -> Get External data -> Import Text File |
| Local |
xmwlm |
Detailed
Spreadsheet |
xmwlm_topasout_R_detailed.csv |
Can be loaded into a Spreadsheet By selecting space as a delimiter and multiple separaters as one it can be imported or loaded with -> Data -> Get External data -> Import Text File |
| Local |
xmwlm |
Summary
Spreadsheet |
xmwlm_topasout_R_summary.csv |
Can be loaded into a Spreadsheet By selecting space as a delimiter and multiple separaters as one it can be imported or loaded with -> Data -> Get External data -> Import Text File |
| Cross-Partition |
Topas -R |
Detailed
Spreadsheet |
topas_cec_topasout_R_detailed.csv |
Similar format to the the top file above = useless |
| Cross-Partition |
Topas -R |
Summary
Spreadsheet |
topas_cec_topasout_R_summary.csv |
This can be imported to Excel and graphs draw. The first four header lines are scrambled. this gives the over all machine view (no Linux support) but not individual LPAR data as its the summary. |
Fixed Command Syntax for topasout
The topasout command has the following options - the syntax from the manual page and "topasoout -?" output is extremely misleading and suggests all sorts of combinations should work that actually do not work at all.
Note: YYMMDD is the year, month and day of the month. MM is minutes. HHMM is hours and minutes.
Mode A) For system use only, do NOT run these manually.
topasout -R daily|-R weekly
Mode B) Machine or LPAR level reports
topasout -c|-s|-a /etc/perf/daily/xmwlm.YYMMDD
The output overwrites the file /etc/perf/daily/xmwlm.YYMMDD_01
Mode C) Other machine or LPAR level reports
topasout -R disk |-R lan | R detailed | -R summary [-i MM -b HHMM -e HHMM] /etc/perf/daily/xmwlm.YYMMDD
The output goes to standard out, so you need to redirect the output to a file.
Mode D) Cross-Partition reports
topasout -R detailed|-R summary [-i MM -b HHMM -e HHMM] /etc/perf/topas_cec.YYMMDD
The output overwrites the /etc/perf/topas_cec.YYMMDD_01
Summary of topasout output and reports
If we ignore the topasout syntax misdirection/misinformation and the nmon style output being totally useless in its current form, then what have we got that can help with Virtual I/O Server monitoring.
- We have local LPAR statistics in difficult to work with formats that will have to be modified to make useful in a spreadsheet or added to a database. This means extra work in script writing but not impossible.
- We have Summary Cross-Partition data that could be used but again will need extra reworking to make useful but not impossible.
- We have Detailed Cross-Partition data is a very difficult to work with format - this would take major reformating to be useful.
Lots of work has gone into these daemons and tools but it is a missed opportunity that the resulting output is so hard to work with for further performance monitoring and capacity planning. Let us hope the next version greatly improved this position!!
vmstat, iostat, lparstat, mpstat type commands
We don't need to say much about these AIX commands. You are allowed to use them as they are already installed by default. You will have to escape the padmin type user to get to root via the oem_setup_env command. These tools can be used online for monitoring or the data saved to a file for later viewing or later processing, formating and perhaps adding to other tools/databases. Don't forget, you could start these automatically via a cron job. Crude but works.
Downside:
- The output format is difficult to deal with
- Requires the writing of scripts and other tools to deal with the data and present graphs
External Monitoring - via daemons
Performance Toolbox (PTX)
This often forgotten X Windows performance monitoring tools, now support POWER5 and Virtualisation statistics. From VIOS 1.3 the daemon that PTX communicates with to extract data from remote machine is avalable by default. PTX the basics:
- You run a daemon on each AIX LPAR
- The PTX Graphical User Interface runs on a machine running X Windows and typically a AIX workstation (although VNC works and you could use an other workstation running X windows remotely)
- With PTX you build up a "monitor" of what you want to capture dynamically on the screen CPU, Disk, Network etc out of hundreds of statistics
- You can also automate the capture and saving of data to files
- You can replay the "monitor" - much like watching a video an zoom forward and back in time
- You can then "after the fact" filter and modify the captured data to support other tools or performance databases.
Downside:
- Only AIX - Zero Linux on POWER support.
- PTX GUI is X Windows - not many users have X Windows these days.
- A tool kit to develop reporting tools i.e. not an "out of the box" solution.
| Sample Screen Captures |
|
|
These two graphs show Entitlement (ent), Physical CPU use (physc), Shared, SMT and Cap Status and the Global CPU utilisation in 2D and 3D modes. 3D allows multiple machines/LPARs to be monitored at the same time. |
IBM Tivoli Monitoring System Edition for System p (ITMSESP)
|
Important URL's:
- ITMSESP additional infomration on this Wiki, includes overview presentation etc
- Announcement
- Download of this free software - warning this is 1.7GB in size and can take a day
- New forum for this software
|
From the Announcement ...
- IBM Tivoli® Monitoring System Edition (ITM SE) for System p V6.1 is a new version of the popular IBM Tivoli Monitoring Product now being offered to System p clients at no charge for the first time. ITM SE for System p V6.1 enables you to monitor the health and availability of multiple System p servers and provides graphical views of your virtualization environment to ensure comprehensive monitoring and quick time to value.
- ITM SE for System p includes out-of-the-box best practice solutions created by AIX and Virtual I/O Server (VIOS) developers. These best practice solutions include predefined thresholds for alerting on key metrics, Expert Advice that provides an explanation of the alert and recommends potential actions to take to resolve the issue, and the ability to take resolution actions directly from the Tivoli Enterprise Portal or set up automated actions. In addition, users have the ability to visualize the monitoring data in the Tivoli Enterprise Portal to determine the current state of the AIX and VIOS environments.
- ITM SE for System p V6.1 is available as a download at no additional charge and includes one year of nonrenewable product support. Clients seeking more advanced capabilities and full cross-platform support (including non-IBM hardware and operating systems) can upgrade to the enterprise monitoring product IBM Tivoli Monitoring V6.1 (for a fee). Because ITM SE for System p V6.1 is based on the same technology, the upgrade simply involves installing the new monitors into the existing environment. IBM Tivoli Monitoring V6.1 supports AIX 5L, HP-UX, Solaris, Linux, i5/OS, and Microsoft Windows on appropriate file hardware, consolidating information from the entire environment in a single, graphical display.
Note: you will have to allocate a system or LPAR to run this software then install DB2, the Tivoli Enterprise Management (TEMS) software, Tivoli Enterprise Portal (TEPS), the Windows client software and agents on the machines you want to monitor. Then you get to see the information.
What is more important is that this is a free download and the first years support is free too (after that it is for a fee). There is currently a lot of confusion over the ambiguous wording in the announcement statement. We may have to wait and try it to see what we get.
You will need a VIO Server update (1.3.0.1) to install the new daemons that are required.
Tivoli Monitoring (regular)
Once the above is available, this is an upgrade away but for a fee.
SNMP
|
There are only a few relatively simple steps to getting the data from xmtopas (now installed and running by default on the Virtual I/O Server) via SNMP
1) Identify/Configure tool to deliver statistics to SNMP agent
2) Create the SNMP MIB Definitions.
3) Implement the MIB extensions on the LPAR to be measured and in the monitoring system.
4) Obtain/Use some SNMP data collection utility
5) Decide what to monitor / how to monitor it. |
1) Identify/Configure tool to deliver statistics to SNMP agent
Typically, the xmservd which runs on AIX as a daemon can provide data to SNMP via the xmservd/SMUX interface. This interface is documented at http://publib.boulder.ibm.com/infocenter/pseries/v5r3/index.jsp?topic=/com.ibm.aix.prfusrgd/doc/prfusrgd/ch13body.htm
Fortunately, we now have xmtopas (equivalent to xmservd) included as standard in AIX 5.3H and VIOServer 1.3.x. We need to activate the SMUX interface, for xmservd/smtopas this is by using /etc/perf/xmservd.res file. We need to create a file called "/etc/perf/xmswervd.res" which contains the word dosmux (starting in the first position of the line).
2) Create the SNMP MIB Definitions
Perhaps, it is valid to simply obtain the MIB file from ANY suitable system and simply load it onto the customer's LPARs ?
It is unclear and if anyone knows the answer please comment on this Wiki page (at the bottom).
This takes three little tricks:
When xmservd or xmtopas is running, sending SIGINT (kill -2) to the process, will dump the MIB file into /etc/perf/xmservd.mib
The first trick is in getting all the configuration files updated. You have to comment-out the default "daily" recording which is started in inittab - note inittab uses : to comment-out a line, so in /etc/inittab change:
xmdaily:2:once:/usr/bin/xmwlm -L 2>&1 >/dev/null # Start xmwlm daily recording
to
: xmdaily:2:once:/usr/bin/xmwlm -L 2>&1 >/dev/null # Start xmwlm daily recording
You have to ensure correct / required entries in configuration files /etc/snmpd.peers, /etc/snmpd.conf and /etc/snmpdv3.conf. You have to ensure that all the entries in those configuration files are "correct" and note: if spaces used instead of tabs, you can get errors.
The second trick is in getting xmservd or xmtopas to start and keep running, so that when you hit it with a signal it will dump the current data (i.e. to provide the desired response to SIGINT). If we have Performance Toolbox (PTX) installed, we could use xmpeek to start it (with the default timeout of 15 minutes). Fortunately, we have now have on recent AIX and Virtual I/O Server versions, topas CEC monitoring. So simply run: topas -C
and wait until it shows some data, q to quit. Find the process ID of xmtopas and use: kill -2
For example on my machine:
ps -eaf | grep xmtopas
root 94238 127108 0 17:14:13 - 0:00 xmtopas -p3
kill -2 94238
This will create the file - /etc/perf/xmservd.mib
The third trick is in getting the mosy command to convert the MIB file into object definition file for SNMP. The xmservd.mib file contains two OBJECT names containing "+" character and the mosy command fails on these names. So edit the file and change
xmdNFSV3ClntReaddir+ and xmdNFSV3SvrReaddir+
to
xmdNFSV3ClntReaddirplus and xmdNFSV3SvrReaddirplus
Then run the command: mosy -o /tmp/mib.defs /etc/perf/xmservd.mib
3) Implement MIB extensions on the LPAR to be measured and in the monitoring system
This is simply a case of concatanating the definitions created by mosy command onto the end of the existing MIB definitions file.
xmperf provides a "Makefile" for doing this, but it simply saves the existing file, creates a new one, runs mosy, concatanates new entries to MIB file.
So we can simply perform those steps manually.
cp /etc/mib.defs /etc/mib.defs.org
cat /tmp/mib.defs >> /etc/mib.defs
rm /tmp/mib.defs
The new xmservd / xmtopas entries should then appear in the MIB tree under
"private / enterprises / ibm / ibmAgents / aix / aixRISC6000 / risc6000private / risc6000perf / xmd"
Use whatever tools are available to "import" the MIB definitions into the monitoring application.
4) Obtain/Use some SNMP data collection utility
For testing, we can use snmpinfo or clsnmp, for example:
- snmpinfo -md -v xmdLPAR
- snmpinfo -h <hostname> -md -v xmdLPAR
5) Decide what to monitor / how to monitor it
Clearly, some of these provide more-or-less similar information ... for example:
- xmdLPARPhysc.0
- xmdLPAREntc.0
Would presumably wish to monitor some statistics for the LPAR itself, and some for the overall system.
6) *Change the community name in snmpd"
That's a whole different subject, and worthy of a mini redbook ...
Downside:
- Lots if fiddly setup
- Might also work this Linux LPARs
- Will not include SEA network - needs confirming.
Remote Extraction from the xmperf daemon via the RSI and SPMI interface and library.
This sample code allow remote (i.e. from an other AIX LPAR or any machine on the network) to extract data from the Virtual I/O Server without any changes to the Virtual I/O Server as one of the "xm" daemon is activated by defaultis and the same daemon as used by the topas -C command to extract data. The extracted data format is in comma separated values and the header line tells your column details. This data can then be used in conjunction with other tools.
The below diagram shows the nlpar program using the Smpi AIX Library and the Rsi remote performance feature to fetch the data remotely from the xmtopas or xmwlm running on the Virtual I/O Server or AIX LPAR the data can be save to a file for later processing ro piped directly to a further tool for analysis or saving to a database of some sort.

This is sample code of using the SPMI Interface and you are free to use this code sample as you like and without limitation. This tool assumed you are a C programmer. This programme will have to be changed for your specific configuration like your specific: disks, network and Shared ethernet Adapter. It can collect specific statistics or all the statistics for a resource.
- The Source code: nlpar.c
- You will, of course, need a C compiler - GNC gcc works fine
- You will need acess to the Performance Toolbox LPP - I think you need to install perfmgr.network and perfagent.tools to get the Rsi.h header files and the libSpmi.a library.
- The data it collects by default:
- LPAR - user, kern, wait, idle, lbusy, pbusy, physc, entc, ent, app, vcsw, maxmem, minmem, memreg, lparnum, shared, capped, smt, maxpcpu, minvcpu, maxvcpu, mincap, maxcap, capinc, onlinemem, mdisl, pcpu, vcpu, lcpu, pcpuinpool, unalloccap, entitledcap, varwght, unallocwght, minvcpucap, phint, entpct, hcalls, hyppct,
- Disks - busy, xfer, rblk, wblk, rserv, wserv, avgserv, avgque, avgwait, qfull,
- Networks - en8-ioctet, en8-ooctet,
- You need to compile with: cc nlpar.c -o nlpar -lSpmi
- Sample output: nlpar.txt
- Note the output is very wide
- All possible stats (from my Virtual I/O Server - yours will be different): nlpar_stats.txt
- Use xmpeek -l <hostname> to list the statistics available to you - note you need to install perfagent.server for this command.
LPAR2RRD - CPU Cross Partition Graphs from HMC data with RRDTool
|
- The tool is capable to produce historical CPU utilization graphs of LPARs and shared CPU usage
- all LPARs (AIX, VIOS, i5OS, Linux on POWER) and all CPU pools stats are included
- The data is extracted from HMC via ssh and loaded into the RRDTool database
- Gives you CPU Cross Partition Graphs based on 60secs CPU utilization averages provided by HMC (agent-less)
- It collects complete physical (HW) and logical configuration of all managed systems and their lpars and all changes in their state and configuration.
- Project home

|
RMC
More here - RMC can be used to escalate performance data
Stealth Tools - workable but not supported by UNIX Support
What do we mean by "Stealth Tools" - put simply these are not officially supports by AIX Support for use on the Virtual I/O Server but actually work very well.
- Simple to Install - As these tools are only one or two files, during installation they do not update other AIX important files nor AIX services and they only involve running one small process each - i.e. they are simple, very well behaved and easy to control.
- Simple to Stop - You could prove any performance problem is not due to these tools by simply stopping it and monitoring the VIO Server via supported tools, for example, by using topas.
- Support - If you contact AIX Support concerning the performance of your VIO Server, I am told that support prefer these tools are not removed as this could make the understanding of the performance issue harder (or impossible). They may ask you to stop or remove the tool as part of the problem determination. Note: Support will not accept data from these tools as evidence of a performance problem - see the nmon Wiki page for hints on using snap and perfPMR for reporting a performance problem to AIX Support.
These tools in this category are:
nmon
|
|
This tools can display Virtual I/O Server performance either on the screen or saved to a file. Data save to a file can be imported in to a preprepared Excel spreadhseet which automatically draws the graphs or verious other tools for displaying on web pages. It is supported by the developer directly and has a large following of users from the past due to monitoring AIX and Linux systems.
|
New in VIOS 2.1
The nmon functionality is now officially available in the VIO Server version 2.1. This is built into the topas command.
- As the padmin user type: topas and then hit "~" (tilde) to flip into nmon mode.
- As root users (using oem_setup_env), you can use the nmon command (/usr/bin/nmon) directly, which starts the topas binary but in nmon mode.
- You can also, as root, capture the performance stats in nmon format to file with cron directly or using the smitty, WebSM, pConsole user interface.
Ganglia
|
Ganglia is an Open Source project for monitoring clusters, additions are available for monitoring POWER5 LPARs. The data ia typcially at once a minute and stored centrally in a rrdtool database. You can view hundreds of machines and LPARs and see history data over hours, days, weeks and months.
|
|
With the new POWER5 stats and user interface additions you now get a whole machine view of all the LPARs (AIX and Linux) and it includes the Virtual I/O Server too. Here a two way pSeries p505 is running 11 LPARs with the Virtual I/O Server in dark blue shows what each of the LPARs is doing over and time period you select. |
lparmon
|
This tool is release by the IBM Dallas System p Demonstrations Centre and used in there demos. It is a simepl and graphical tools that can help you see what is going on in your machine. It also looks good.
|
The postings on this site solely reflect the personal views of the authors and do not necessarily represent the views, positions, strategies or opinions of IBM or IBM management.
That's All Folks!!