IBM Support

Using rrdtool to graph vmstat output - a worked example

How To


Summary

Rather a "blast from the past", a quick look at the rrdtool tool to graph performance stats the "old school" way from the 1990's.
In the 2020's, I use InfluxDB and Grafana or similar tools are the way to go.

Objective

Nigels Banner

Steps

A large UK customer is running a series of tests on AIX and POWER servers and wanted to display the results on a web server that uses rrdtool. They requested assistance to get a working example running.  So I stepped up to the challenge one night rather than it becoming a search for rrdtool skills inside IBM as I could sort out a solution in a few hours.  No one in IBM would claim to be a rrdtool expert until they knew the details of what was required. Many people know it enough to get what they want done but would not put "rrdtool guru" on their CV.
The rrdtool package is a fantastically brilliant command to have in your toolbox. Up there with awk, grep, sed, Apache, ksh, and nmon.  It is used to save data in a fixed size "database", does cascade summation of older data to keep the data volume down. The rrdtool command can extract the data across any period and then it can quickly generate impressive .gif file graphs from the data. Gif images are perfect for displaying on a web server. So the challenge: While the test runs for "some period" like an hour, save the vmstat data and then graph it.
Part one - create a suitable rrdtool database for the vmstat output
The following command execution snippet is a reminder of the vmstat output on AIX:
  $ vmstat 1 4    System configuration: lcpu=16 mem=8192MB ent=2.00    kthr    memory              page              faults              cpu  ----- ----------- ------------------------ ------------ -----------------------   r  b     avm  fre  re  pi  po  fr   sr  cy   in  sy   cs  us sy id wa   pc   ec   1  0 709239  4276   0   0   0   0    0   0   31 14401 361 31  1 68  0  1.01  50.6   1  0 709239  4276   0   0   0   0    0   0   42 15786 456 31  1 68  0  1.01  50.4   1  0 709146  4368   0   0   0   0    0   0   16  8989 207 31  1 67  0  1.04  51.9   1  0 709146  4368   0   0   0   0    0   0 117  16944 627 31  1 68  0  1.02  50.9
 Warning:
  • I never notice until writing this example but there are two "sy" columns
  • We rename "sy" (system calls) column to "sc" and leave "sy" for the system utilisation column. 
We are going to use the same column names as the vmstat command to make it easy to understand although that are short.  When we graph the stats, we can spell out the stats full names. Without using the power of rrdtool's to summarize data we are going to use a sledgehammer so save 86,000 at a rate of one second data samples, which is about 24 hours.
In the following code snippet is the "rrdtool create" command:
  rrdtool create vmstat.rrd --step 1  \  DS:r:GAUGE:5:U:U \  DS:b:GAUGE:5:U:U \  DS:avm:GAUGE:5:U:U \  DS:fre:GAUGE:5:U:U \  DS:re:GAUGE:5:U:U \  DS:pi:GAUGE:5:U:U \  DS:po:GAUGE:5:U:U \  DS:fr:GAUGE:5:U:U \  DS:sr:GAUGE:5:U:U \  DS:cy:GAUGE:5:U:U \  DS:in:GAUGE:5:U:U \  DS:sc:GAUGE:5:U:U \  DS:cs:GAUGE:5:U:U \  DS:us:GAUGE:5:U:U \  DS:sy:GAUGE:5:U:U \  DS:id:GAUGE:5:U:U \  DS:wa:GAUGE:5:U:U \  DS:pc:GAUGE:5:U:U \  DS:ec:GAUGE:5:U:U \  RRA:AVERAGE:0.5:1:100000
You can look at the rrdtool manual pages for the details but we are basically turning off all the fancy features.
Part two - saving vmstat output in rrdtool format
"rrdtool update" is used to put data into the database and adding one row of vmstat data need to look like this:
  rrdtool update vmstat.rrd 1354235156:1:0:706069:34785:0:0:0:0:0:0:35:3600:566:0:0:99:0:0.03:1.7
The first number (1354235156) is the number of seconds since the epoch that is 1 Jan 1970. Obvious really! And the rest is a colon separated list of the stats from vmstat.  Fortunately, the UNIX date command can get you that date in the epoch format:
 
  $ date +%s  1355307792
So here is how you change vmstat data to rrdtool update commands for an hour (3600 seconds):
   TIME=`date +%s`   vmstat 1 3600 | awk -v time=$TIME '/^.[0-9]/{ n++; print "rrdtool update vmstat.rrd "time+n":" $1 ":" $2 ":" $3 ":" $4 ":" $5 ":" $6 ":" $7 ":" $8 ":" $9 ":" $10 ":" $11 ":" $12 ":" $13 ":" $14 ":" $15 ":" $16 ":" $17 ":" $18 ":" $19 }' >vmstat.output  ENDTIME=`date +%s`
 Awk is good at this sort of thing:
  • We put the Korn shell variable into the awk variable with time=%TIME
  • We ignore lines not starting with a number with the following syntax: 
   /^.[0-9]/
  • We use a counter "n" so each line of output has the date in epoch seconds one more than the previous line with n++ and the "time+n"
  • The rest is formatting to colon separated.
Note: we need the start and end times for graphing, so we extract the right time period from the database
Part three - loading the data
We have the rrdtool commands in the vmstat.out file that we created, so run the file through a Korn shell
  ksh <./vmstat.output
Part four - generating the graph files
Generating graphs is tricky as there are so many command options but here are a few worked examples.
First, a simple graph of the Physical CPU consumed. Assuming that we have a Shared CPU virtual machine (logical partition):
  rrdtool graph physical_consumed.gif \  --title "Physical CPU Consumed" \  --vertical-label "CPUs" \  --height 300 \  --start $TIME \  --end $ENDTIME \  DEF:pc=vmstat.rrd:pc:AVERAGE LINE2:pc#00FF00:"Physical Consumed"
Notes:
  • The top line is the command and the name of the file to generate.
  • -title and vertical-label as you might guess are the top and left labels on the graph.
  • -height is the size of the graph in pixels so they display well on a website.
  • Then, we have the start and end time in seconds.  In this example, we select all the stats in the database but you could change the times to pick out more interesting periods out of the available data.
  • The last line is complex ...
  • The pc=vmstat.rrd part is specifying the column that we want from the vmstat.rrd database file
  • AVERAGE is how to deal with more data than we can graph and alternative are, for example, MIN and MAX 
  • LINE2 makes it a line graph and the 2 means thicker lines. Good for a simple one line graph. For multiple lines on one graph, use thinner LINE1 lines.
  • The Hex number is the colour (RGB pairs) - although you can use colour names out of a half a dozen colours the Hex number is easier
  • The title "Physical Consumed" is what is used as the key at the lower edge of the graph (not necessarily on a one line graph with a good title)
See the sample graphs lower down this page.

Here is a more complex graph as it is a stacked area graph of the four utilisation numbers:
 
  rrdtool graph cpu_utilisation.gif \  --rigid --lower-limit 0 --upper-limit 100 \  --title "CPU Utilisation" \  --vertical-label "Percent Stacked" \  --start $TIME \  --end $ENDTIME \  --height 300 \  DEF:us=vmstat.rrd:us:AVERAGE AREA:us#00FF00:"User" \  DEF:sy=vmstat.rrd:sy:AVERAGE STACK:sy#0000FF:"System" \  DEF:wa=vmstat.rrd:wa:AVERAGE STACK:wa#FF0000:"Wait" \  DEF:id=vmstat.rrd:id:AVERAGE STACK:id#FFFFFF:"Idle"   More notes:
  • The -rigid etc is because we don't want rrdtool to determine the scales as we know it is 0 - 100% and we want to visually compare graphs on a constant scale.
  • There are four DEF lines, one for each of the utilisation stats. The first is AREA and the rest are STACK type so they are place one on top of the other.
  • Each stat is given a suitable colour in Hex
Part five - what you get is this type of graph
pc
Note: The previous graph has a 1 minute capture and why there is no scale along the lower graph edge with the date and time.
util
Part six - want to give it a try?
You need a copy of rrdtool - assuming you are using AIX
  • The home website is https://oss.oetiker.ch/rrdtool/index.en.html
  • The developer is Tobias Oetiker.
  • There is a version on this website for download.
  • A more up-to-date version is here on my favourite open source for AIX provider:
    • AIX Open Source Toolbox https://www.ibm.com/support/pages/aix-toolbox-linux-applications-downloads-alpha
    • Alternatively, Micheal Perzl's website:  http://www.perzl.org/aix/index.php?n=Main.Rrdtool
    • Note: for either download site, there is a long list of dependent software you need to download.  The prerequisites make it tricky to install. Try it on a spare virtual machine before you update an important production server!
    • With Linux, it is a simple download.
If you want simple access to the graphs, then you need a Web server:
 You need the shell script in the following code snippet:
  export SECONDS=3600  echo this script captures for $SECONDS seconds    echo remove the vmstat.rrd database in this directory  rm vmstat.rrd    echo  create vmstat.rrd for 10000 seconds = over 27 hours max at 1 second captures  rrdtool create vmstat.rrd --step 1  \  DS:r:GAUGE:5:U:U \  DS:b:GAUGE:5:U:U \  DS:avm:GAUGE:5:U:U \  DS:fre:GAUGE:5:U:U \  DS:re:GAUGE:5:U:U \  DS:pi:GAUGE:5:U:U \  DS:po:GAUGE:5:U:U \  DS:fr:GAUGE:5:U:U \  DS:sr:GAUGE:5:U:U \  DS:cy:GAUGE:5:U:U \  DS:in:GAUGE:5:U:U \  DS:st:GAUGE:5:U:U \  DS:cs:GAUGE:5:U:U \  DS:us:GAUGE:5:U:U \  DS:sy:GAUGE:5:U:U \  DS:id:GAUGE:5:U:U \  DS:wa:GAUGE:5:U:U \  DS:pc:GAUGE:5:U:U \  DS:ec:GAUGE:5:U:U \  RRA:AVERAGE:0.5:1:100000    echo Note the vmstat sy faults coloumn is renames st so sy is system time    TIME=`date +%s`  echo startseconds $TIME    echo Capturing for $SECONDS seconds  vmstat 1 $SECONDS >vmstat.txt &  vmstat 1 $SECONDS | awk -v time=$TIME '/^.[0-9]/{ n++; print "rrdtool update vmstat.rrd "time+n":" $1 ":" $2 ":" $3 ":" $4 ":" $5 ":" $6 ":" $7 ":" $8 ":" $9 ":" $10 ":" $11 ":" $12 ":" $13 ":" $14 ":" $15 ":" $16 ":" $17 ":" $18 ":" $19 }' >vmstat.output    ENDTIME=`date +%s`  echo endseconds $ENDTIME    echo load the vmstat data into the vmstat.rrd database  echo the file has `wc -l vmstat.output` lines  ksh <./vmstat.output    echo graph the data  rrdtool graph cpu_utilisation.gif \  --rigid --lower-limit 0 --upper-limit 100 \  --title "CPU Utilisation" \  --vertical-label "Percent Stacked" \  --start $TIME \  --end $ENDTIME \  --height 300 \  DEF:us=vmstat.rrd:us:AVERAGE AREA:us#00FF00:"User" \  DEF:sy=vmstat.rrd:sy:AVERAGE STACK:sy#0000FF:"System" \  DEF:wa=vmstat.rrd:wa:AVERAGE STACK:wa#FF0000:"Wait" \  DEF:id=vmstat.rrd:id:AVERAGE STACK:id#FFFFFF:"Idle"    rrdtool graph run_queue.gif \  --title "Process Run Queue" \  --vertical-label "Processes" \  --height 300 \  --start $TIME \  --end $ENDTIME \  DEF:r=vmstat.rrd:r:AVERAGE LINE2:r#00FF00:"Run Queue"    rrdtool graph physical_consumed.gif \  --title "Physical CPU Consumed" \  --vertical-label "CPUs" \  --height 300 \  --start $TIME \  --end $ENDTIME \  DEF:pc=vmstat.rrd:pc:AVERAGE LINE2:pc#00FF00:"Physical Consumed"    rrdtool graph entitlement_consumed.gif \  --title "Entitlement CPU Consumed" \  --vertical-label "CPUs" \  --height 300 \  --start $TIME \  --end $ENDTIME \  DEF:ec=vmstat.rrd:ec:AVERAGE LINE2:ec#00FF00:"Entitlement Consumed"    echo images available  ls -l *.gif
- - - The End - - -

Additional Information


Other places to find content from Nigel Griffiths IBM (retired)

Document Location

Worldwide

[{"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SWG10","label":"AIX"},"Component":"","Platform":[{"code":"PF002","label":"AIX"}],"Version":"All Versions","Edition":"","Line of Business":{"code":"LOB08","label":"Cognitive Systems"}},{"Business Unit":{"code":"BU054","label":"Systems w\/TPS"},"Product":{"code":"HW1W1","label":"Power -\u003EPowerLinux"},"Component":"","Platform":[{"code":"PF016","label":"Linux"}],"Version":"All Versions","Edition":"","Line of Business":{"code":"","label":""}}]

Document Information

Modified date:
14 June 2023

UID

ibm11165348