This AIXpert Blog has been superceded by version 3 of the scripts and covers:
- Energy, SSP I/O and not Server and LPAR Performance Stats.
Click HERE to get to the new AIXpert Blog entry.
- I have just been told the Energy stats do not work on the POWER8 E870 or E880 as they don't include the hardware to collect measurements
- Unfortunately, it is not documented in the IBM KnowledgeCenter
- Supported are POWER8 S8xx, S8xxL and E850, when connected to a HMC
Note: this Python script creates HMC Sessions but does not log off so they build up over time & could cause your HMC low memory problems.
- The solution to that is covers in a new AIXpert Blog here: Avoiding HMC REST API Session Logoff Issues
he HMC can support loads of statistics about the servers it controls at both the server level and the Virtual Machine level.
This function has been available for a year or more.
Below we are not covering the regular Server CPU, RAM, Disk performance stats but they follow the same scheme.
In December, 2016 the following were added:
Electrical power supplied (Watts) - at Server level
Temperatures (Celsius) - at Server level but multiple measurements: motherboard points, CPU(s) and air inlet.
Shared Storage Pools disk I/O (SSP size, SSP free space, read and write stats for bytes, operations, service times, errors)
POWER8 or above
HMC 860 level (Q4 2016) or above
System Firmware 860 level (Q4 2016) or above.
Python 3.5 or above (actually I think 2.7 is fine too but untested)
Korn shell (actually the script is very simple so it may work with Bash too)
For SSP statistics, you also need VIOS 2.2.5+ (Just noting this so you can get ready for the next Blog)
The documentation was release mid January 2017
IMHO: The documentation leaves many unanswered questions on the use of this interface.
And a real need of a worked example.
This AIXpert Blog aims to fill that information vacuum with
Regard this as Tool Kit to develop further yourself to suit your needs.
For example, my S822 and S824 have four POWER8 chips each resulting in four CPU temperatures.
- If your machines is S812 or S814 then you will have less POWER8 CPU chips and will have to adjust the Python program and Korn script.
- If you have a E850
, E870 or E880 you will likely have many more POWER8 CPU chips.
- If I get sample data from readers with smaller
and larger machines then I can improve the scripts to cover all server sizes.
Warning this ain't easy! There are many technologies to deal with at once.
- HMC = Hardware Management Console
- PCM = Performance and Capacity Metrics - data you can get from the HMC
- XML = eXtensible Markup Language.
- Nigel's definition: hierarchical structured files with lots of angle brackets.
- Nigel's definition: hierarchical structured files with lots of curly brackets and arrays in square brackets.
- HTTP / HTTPS = Hypertext Transfer Protocol for browser communicating with a web server. HTTPS = meant to be secure!
- REST API = REpresentational State Transfer. Makes a Application Programming Interface between machines.
- Python = Interpreted Scripting language
- curl = AIX or Linux command to communicate with a web server
- Java = a slow language :-)
To get this to work you need to know and understand a wide number of technologies
- Understand the HTTP protocol and error codes.
- Understand XML format and structure i.e. the basics and be able to read them.
- Understand JSON format and how to reformat it when it is handed to you all on one 100 MB single line!
- Understand REST API generally.
- Find a suitable language to work with XML and JSON formats
- Simple HTTP can be done with curl - a command line program that you can embed in say ksh scripts but that is no good for handling complex XML and JSON formatted file.
- I decided to use Python = simple language but it has a few annoying short comings.
It is a bit like Korn shell, but the indentation is used instead of do and done code section markers.
Don't get me started on the if, else and for needing a ":" and indenting is used to define blocks of code resulting manually space indenting the file.
- Add the Python XML module
- Add the Python JSON module
- I used Ubuntu 16.04.2 operating systems on Power (works exactly the same on x86 but at half the speed)
- python --version
- >> Python 3.5
- The units of the numbers returned are Watts and Celsius
- The stats can be extracted in three different formats
- Raw, Processed and Aggregated - Hint: to save you time, start with Processed :-)
- The extracted data needed to be re-formatted to quickly generate graphs - I am using Googlechart and the data is a simple Comma Separated Values format (for Googlechart see https://developers.google.com/chart/)
Getting the Energy stats from the HMC protocol
This involves many trips to the HMC and back
Python program to extract the Energy stats syntax & options
The program is used like this
usage: PCM_energy.py [-h]
-h, --help show this help message and exit
--hmc HMC Specify the hmc ip address or hostname
--username USERNAME HMC username
--password PASSWORD HMC password
--server SERVER Server name as seen on the HMC (ALL to select all
--path PATH Output directory where the HMS responses are saved
--type TYPE Types: RawMetrics|ProcessedMetrics|AggregatedMetrics
--debug Output details as we go
--save Save to files the HMC responses
--flagson Set preferences LT=on Agg=on Energy=on
--flagsoff Set preferences LT=off Agg=off Energy=off
Here is a simpler example (assumes you have a sub-directory called "energy" where in-flight files are saved if you are using the --save option)
./PCM_energy.py --hmc hmc14 --username hscroot --password SECRET
--path energy --type ProcessedMetrics --server P8-S824-lime
These six flags are mandatory
- --hmc hmc14 is the hostname of my Hardware Management Concole - I am running the Python program from a Linux server local to the HMC (on the local network) otherwise I would use the full hostname hmc14.aixncc.ibm.com
- --username and --password - obvious the HMC logon details
- Note that user MUST be a HMC superAdmin privileged user to set the preference flags
- Note: otherwise a regular Viewer user can be used
- --path energy is a sub-directory where the files and data are stored here a sub-directory called "energy".
- If you set --save option there will be hundreds and large files so we keep them isolated so you can remove them later.
- --type valid options are RawMetrics or ProcessedMetrics or AggregatedMetrics
Optionally you can add:
- --debug = verbose mode for more output on screen
- --save = saves a copy in the --path directory of all the data files returned by the HMC
- --flagson = this forces the three energy monitoring preferences to true and POSTs is back the HMC to switch on monitoring
- --flagsoff = this forces the three energy monitoring preferences to false
If its the first time you have gathered energy stat you need to force the Preference flags to True so add the --flagson option
This switches on the data collection
./PCM_energy.py --hmc hmc14 --username hscroot --password SECRET --path energy
--type ProcessedMetrics --server P8-S824-lime --flagson
It you want lots of information to the screen and all the data to and from the HMC saved in to files add the other options like this
./PCM_energy.py --hmc hmc14 --username hscroot --password SECRET
--path energy --type ProcessedMetrics --server P8-S824-lime
To generate the graphs, I have used ProcessedMetrics and use cron to collect them every 15 minutes.
- This means there is a considerable overlap in each gathering of stats (which are removed out using sort and uniq in the script).
- This could be reduced
- Collect stats every 2 hours - but then you can't see what happened recently.
- Use the option to lower the number of stats returned - but if the data gathering gets interrupted like a reboot or power off then lots of stats would be missing.
- As Googlechart is time aware the graphs will look OK but have straight lines across missing data.
- Like 2 but calculate the start and end time - this would involve a load of coding that I prefer to avoid in this example.
This Python program take about a minute to run. About 40 seconds on the HMC while it prepares the data and about 20 seconds on the machine running the program as it sorts out the data. This was measured on a Linux on Power machine S822LC. My, admittedly 2 years old, Intel servers take 40 seconds longer.
What you get:
- The data is saved in a CSV file: energy-PROCESSED.csv
- The Data is like this sample one line :
PRO, P8-S824-emerald,8286-42A*1BBEC7V, 2017,02,08,17,04,53, 770, 31,32,36, 41,42,46,44, 29
- getType = RawMetrics (RAW), ProcessedMetrics (PRO) or AggregatedMetrics (AGG)
- Servername (as seen on the HMC)
- MachineTypeModel*SerialNumber (IBM speak for the sort of machine S824 = 8286-42A*122EC7V)
- timeStamp of six fields YYYY MM DD HH mm SS
- three motherboard temperatures in C
- four POWER8 processor temperatures in C (my S824 has four POWER8 chips)
- one inlet temperatures in C
Download the Python code, Ksh graphing script and sample files
Details: 18 KB gzipped tar file 10th Feb 2017
Graphing the results
The Googlechart graphs are generated from the CSV file with a simple Korn shell script "Energy_grapher".
This takes the CSV file and creates a .html webpage, which you can view in a Browser or share via a website.
- This is a sample output of the graphs:
Here are the graphs generated as images:
Energy Electrical Power use in Watts
Energy Temperature in Celsius
Back ground useful information
- The Python script is behaving like a web browser and the HMC as a web server
- Web requests are
- '''GET''' = the browser requests of the server to send stuff like a webpage or images i.e. normal web browsing of simple pages. Here this is used to get preferences and "files" of information or JSON data.
- '''PUT''' = when we want the server to do something but it need some small amount of information i.e. the login needs to send a user and password.
- '''POST''' = when the browser needs to send a lot of data to the server - it a browser then might be a page of test you just edited on the screen. Here its used to send back amended preferences.
The python script acting like a browser sends block of text to the HMC which can have several parts
- The '''URL''' - this is a web server (HMC in this case) hostname, a port number here always 12443 (in this performance stats case) and a file name - this may not actually be a file but it tells the HMC what you want it to do.
- '''HTTP / HTTPS''' details - the request module does this for us including calculating the size of the data. In web page terms the surrounding <html> and </html>
- '''Header''' information in a web page this is between <head> and </head>. This might give the HMC hint about the data format (Content:) like XML and the acceptable response format (Accept:). After login this includes the authorisation Token = a proof of who we are.
- A '''Payload''' This is like the webserver sending the webpages data. In this case it is only used in the PUT and POST requests to send data. in Web page terms the <body> and </body>
XML is weird format
You will have be able to read XML to understand the data and script.
- There are many nested levels of the data sections like this below. The sections here have names (tags) like: alpha, beta and gamma
- Note the start and end of a sections are matched and its a convention to indent with spaces (although that can make the files large) - the stop has the extra "/" before the tag name.
- At every level there can be data and in <id>42</id> or <name>Nigel</name>
- Also the tags can have more data like <beta xmlns:ns2="http://a9.com/-/spec/opensearch/1.1/"> or <gamma kbe="false" kx=0> ...
- All sort of strange text and as far as I known totally meaningless stuff added by Java modules at the HMC
- Get these strange bits wrong or in the wrong order on the POSTing preferences payload back to the HMC results in total rejection and often a stack trace dump of 200 lines.
Three different performance stats formats
There are three different sorts of data available to the SSP and at 5 levels of detail.
- This is not documented well at a practice level - What you get and which to use?
- Are named Raw as these are the raw stats saved on the HMC from data collectors on the Service Processor on each server.
- *** This bit needs checking ***
- Raw stats are painful to use because they are the incrementing counters
- So you need to get two sequential sets of stats take the difference (called a delta) and divide by the time in seconds between the two captures to get something useful
- The stats are saved every 30 seconds
- Now the totally wacky bit:
- The data is supplied in 66 separate files each with one snapshot - so you get just over 30 minutes worth in one go.
- *** And the data files are not in chronological order!!
- *** End of needs checking ***
- IMHO I think this is very hard to work with unless you are over writing database entries
- If you don't collect data within the 30 minutes, you can't later get the stats.
- These are based on the Raw stats and the complicated maths is done for you - phew.
- You get one file of data per server - in a different format to the Raw data!
- 30 second data and around 240 data points so that is 2 hours worth
- With my case I only have 3 POWER8 servers connected to the HMC so the data volume is small - if you have many dozens of the servers it can add up.
- This seems the best option to me - one file, no min & max (like Aggregated) and we could fail to get data for 2 hours before we have wholes in the records and so graphs.
- Getting the data more often means we have to remove duplicates (sort and uniq UNIX command can do this)
- In the same format as ProcessedMetrics
- You get three files for the three different snapshot rates
- 5 minute data (300 seconds) and 166 snapshots covering roughly half a day
- 2 hour data (7200 seconds) and 92 snapshots covering roughly a week
- 24 hour data (86400 seconds) and 60 snapshots covering roughly daily for 2 months
- Now the wacky part:
- The three stats files include Average, Minimum and Maximum values
- If you are trying to graph one number you can get a neat Average plus Min and Max values
- If drawing more that one stat on one graph then you get in to a terrible confusing mess
- And if you have occasional large peaks they will be pretty uninteresting too as the max in every period would be about the same.
For all three data types you can specify a start time and end time.
- I don't see a point to allowing you setting an "End time" - I would always what the latest stats.
- Start could be useful: if you have a daemon running and it knows that last snapshot time stamp and so not get duplicate data.
- This does not work for a snapshot script like this although you could extract the last time stamp from your saved data and to datetime maths!!
Below: Walk Through Commentary of the Python & Graphing shell script
Detailed Walk through of PCM_energy.py
- Note: this is example code simplicity is more important than good Python style hence no functions are defined (most of the code it called once)
- We could make most of this in to a Python module at some point
- Line 1: This is a Python program so invoke the Python 3 interpreter
- Lines 2 - 12: Load Python modules which give extra functions - this is the way Python adds code libraries
- requests does the HTTP to and from the HMC
- Element Tree for extracting data from XML format
- sys and shutils for system tools and shell access
- argparse for handling the program arguments
- datatime for access to the current date and time
- json for handling JSON format data
- Lines 14 - 52: Handling the program arguments - Fairly straight forward checks, warning and help info
- Lines 54 - 55: Switch off annoying insecure use warning message. We are using verify=off so Certificates are not checked.
- The network packets are encrypted but the HMC end point certificate is not checked so there is a "man in the middle" security risk. If you login with a passive no changes can be made HMC Viewer User Role then it is lower risk.
- This could be made secure but its complicated and not the point of the worked example.
- An exercise for the student!
- Check out:
- Ideas from the PowerVC/OpenStack REST API from Matthew Edmonds: https://developer.ibm.com/powervc/2016/12/20/automating-via-powervc-apis/
- And the Python Advanced Request module webpage - http://docs.python-requests.org/en/master/
- Line 56: Output a reminder of the sort of data requested to the user.
- Lines 65-66: Access the HMC.
Prepare the HTTP header, URL and HTTP body for logging on to the HMC based on the username and password in the program arguments
- Header - Warns the HMC webserver about whats coming: very specific details for the Logon Request.
- URL - The HMC Server details on the network and a filename which can also indicate what you want back: using HTTPS, the HMC Hostname, the HMC PCM port number and the filename tells the HM what we want to do.
- The Payload - Really odd format but needed by the HMC as it supplies the username and password obviously.
- Warning: If you want to update the Preferences like switching on stats collection then the HMC user must be superAdmin Role. IF already switched on then the user can be even the humble Viewer Role.
- Line 69: The request function (part of the request module) pulls together the above three items and makes the HTTP PUT request across the network to the HMC to login
- The "verify=False" switches off the security of nomal HTTPS Certificate checking.
- Lines 69: The result is places in the "r" Python variable. Which is then converted to a XML array called XMLresponse
- Lines 70 - 74: Check the return status was OK. As with all websites, the only valid response is 200 (same as for all websites using HTTP) anything else is an error.
- See https://en.wikipedia.org/wiki/List_of_HTTP_status_codes for all the error codes - they can help you workout what went wrong.
- If its an error the code prints the return code and the returned "website data" which can give some details of what went wrong - like you got the username of password wrong, the syntax of the request was not understood or some HMC problem.
- The sys.exit() function halts the Python program as it can't continue.
- Lines 77 - 78: Save the results in xmlResponse and convert it to XML format
- Line 79: Save the session Token (part of the response, which is used for ALL subsequent interactions with the HMC as proof of identity and session details.
- A session Token looks like this random block of 236 characters (and I thought IBM passwords were long!!)
- Lines 80 - 81: If debug is not zero - output the session token to the screen
- Lines 92 - 94: We have the sessionToken, next the PCM Preferences (these are a set of Flags) are extracted so we have the flags which may need setting to start the collection of Energy stats AND we have the names and ids of the servers connected to the HMC.
- Prepare the data for the GET request to the HMC. No data in the header except the session Token and the URL includes the "preferences" that we want so the HMC webserver knows what to do.
- Line 95 - Make the GET request to the HMC and save the response in the variable "r"
- Line 97 - 100: Checks the return code just like the previous request, if we got an error, stops the program after printing useful diagnostics on the screen
Line 102: If required (save == true) save the Preferences to a file for examination
- For my HMC it has 12 POWER machines connected so the file is fairly large with 100's of lines of XML
Lines 112 - 113: the GET response is text and is converted to a XML data so we can extract things out of the data - later.
- Line 120: Checks for the flagson or flagsoff variable - if either is set the program will switch the Preference flags
- These are:
- What these actually mean is not well documented - without the EnergyMonitorEnabled = true you will not get any stats at all. So first time you MUST use the option --flagson
- Line 122: The postBody is created from the GET Preferences request response
Normally a website send the Browsers the <body> data</body> which is the bulk of a website page. Here it is being returned because we want the HMC to accept the new Flag settings. The HMC is extremely fussy on the structure of this data.
- Line 124 to 128: Use the string replace function to remove annoying parts of the response
- What these bits mean is not documented but the are found in different orders in the flag setting so its best to just remove them completely. I waste hours on this quirk!
- Lines 131 - 144: Uses the replace function to set the Flags to all false or to all true.
- WARNING: I found that if these flags are set to true, true and false then setting them to true, true and true gets you an OK return code from the POST request but it does not actually set EnergyMonitorEnabled to true and so you get no energy stats. I could however, set them all false and then all true and get EnergyMonitorEnabled=true.
- Note when you first set EnergyMonitorEnabled=true - take a lunch break. Then when you request Energy data it has actually collected some. Otherwise if you immediate request data there if none for 5 minutes and you will get no stats returned and assume its not working.
- Lines 147 - 152: Unfortunately you can't use the whole GET Response (with modified flags) to POST the changes back to the HMC. You have to remove the first few lines and the last few lines
- It must start with XML line
- and end with this XML line with </ManagementConsolePcmPreference:ManagementConsolePcmPreference>
- This is not documented. I am told this is completely normal, obvious and automatic, if you are using Java! It was a shock to me.
- Lines 154 - 159: If you asked for the data to be saved (save == true) then the modified Preference data is saved to a file for you to check the contents.
- Lines 162 - 163: Prepares the request to the HMC. Note: the Content we are sending is XML and there is also the security token. The URL is the same as the GET request. The switch from GET to PUT informs the HMC that we are setting the Preferences now.
- Line 164: POST the request to set the Preferences flags.
- It is not documented if this is permanent change or if the settings are for the session only or will timeout can get switched off if the states are not regularly fetched.
- Lines 165 - 171: The result is checked as normal for 200 = OK and if in error we output all the information possible to help diagnostics. If in error, the program stops with an error.
Before we go on: Lets take a look at the XML Preferences file / data.
- The Green Arrows show the flags that we need to set to get the Energy data.
- Note: this first machine P6-p520-Silver is NOT POWER8 but is a POWER6 Server as the name suggests, so the Energy stats will not work even if the flags are set.
- The Red Arrows show the structure of the XML sections.
- that is: feed -> entry -> content ->ManagedConsolePcmPerference (this is the HMC) . . . ManagedSystemPcmPerference (this is the first Server) . . .
- This is reflected in the Python code below.
- The Blue Arrows show the two things we really need. The code picks out a few more that are interesting.
- This is the Server names as seen on the HMC and the Atom Id which is an unfortunate name as its really the Server Unique name that we have to use to extract data. Unfortunately, we can't use use the Server name - Oh no that wold be far too easy!!
Back to the code:
- Line 179: Reminds us we are looking for a particular Server - the name "ALL" we try to extract Energy data from all Servers.
- Line 182: In the XML formatted data the top main section is called "feed". The code will then look inside "feed" for the next section inside it and execute the code inside the for loop
- The "for x in y" reads in the XML of y for each section execute the code with the current internal section called feed - in our case the first section is < id> which our program ignores, then the next section <title> which our program ignores and so on.
- Line 183: The code ignores sections that don't have a tag of 'entry' so the <id>, <title>, <subtitle>, <link>, <generator> sections are skipped over. When we find the section where the feed.tag == 'entry' it is a XML section we want to look inside so we start a new for loop.
- Line 184: Enters the <entry> section - again only one of these.
- Lines 185 - 187: Works similarly entering the <content> section and <ManagementConsolePcmPreference> (I called this one the hmc details itself)
- Lines 188 - 189: The Section <ManagedSystemPcmPreference> is the terrible name for a Server section!! But the HMC documentation normally calls a POWER based Server a "Managed System"
- Lines 190 - 201: Pick out some of the Server facts mostly out of interest.
- Lines 202 - 207: Pick out the Atom Id from the weird Metadata section.
- Lines 208 - 209: Is indented four characters less, so we get here if all of the Server section items have been processed as above or ignored. If the SystemName has been set then we assume we have the other data too and print the details on the screen.
- Line 210: If this is the System Name = Server that we are looking for we then use the Atom Id = ServerUUID in the next part of the code to ask for Energy stats
- Having found the Atom ID (Server Unique Id), we can request the Energy data but that would be far too easy!!!
- We Request the sort of data we want and the data gets prepared on the HMC and we get back the name of the file(s) we can then download. Of course, the names of the files are embedded in yet more XML that we have to parse!
- Lines 219 - 229: Prepare the request for the three different data types Raw, Processed and Aggregated.
- Note the URLs are different for the three data types and the later two have odd URL endings of "?Type=Energy" - seems rather pointless to me.
- Note: the Headers are different too.
- These settings worked for me and I gave up trying dozens of combinations once it worked. Try alternatives at your own risk.
- You can also reduce the data returned by explicitly specifying the
- start date+time and
- end date+time or
- the maximum numbers of records.
- Check the documentation for the syntax.
- Line 231: Makes the request to the HMC to get the XML encoded list of files.
- Depending on the data type requested you can get back between 1 to 66 file names!
- Lines 232 - 239: Do the regular checking of the returned error code and printing out the results.
- Lines 240 - 244: Save the returned data to a file if requested with the --save option.
Next we have to parse the XML list of file names:
- Lines 246 - 247: Convert the GET Request results to XML format so we can parse the details.
- Lines 248 - 254: Delve in to the XML looking for <feed> and then <entry> and then extracting the <title> which is the filename (which is used to save the file, if necessary) and more importantly we extract the <link> which the the URL to fetch the file from the HMC. The code used the variable JSON_file_url - think of this as the file name on the HMC with the actual data in it that we then have to request.
- Line 261: Sets up the request header and we have a URL from the file <link>, there is no Body required for the simple fetching of a file.
- Lines 262: GET Request the JSON file. So we only need the Session Token and the JSON_file_url to GET the date file.
- Lines 264 - 269: Check the return code and report an error - as before.
- Lines 270 - 274: If requested, save the JSON data to a file using the <title> filename. We don't that the whole URL in a local file name.
- Line 277: Converts the returned file to JSON format within Python
A few words on JSON format
- The JSON files do not generally have any linefeed characters in them so the whole thing is on one line - this can confuse or even crash some editors and is particularly hostile for humans to reading.
- The HMC REST API can return lots of data the record so far from the HMC is 100+ MB in one file.
- JSON formatting is very long winded with a paragraph for every number.
- You might think it would contain "inlet temp in C = 42" but it looks like this
- That is ~400 bytes to express the number "42" - oh well!!
- Python can handle the awkward format.
- If you want to read the contents work around the unusable formatting it needs reformatting
- The best way to make the JSON file readable is actually to use a Python module, assuming you saved it to a file:
python -m json.tool <filename.json >formatted.json
- Keep the filename ending .json as that helps some editors to switch on syntax highlighting (like the colours below while using vi).
- The Blue Arrows shows the hierarchy to reach the name of the server "P8-S824_emerald
- Which is systemUtil -> utilInfo -> name -> and the data
- In Python using the json module the syntax is:
variable-name = data['systemUtil']['utilInfo']['name']
- The Red Arrows show how to get to the array of data samples
- Which is systemUtil -> utilSamples
- Note the "utilSamples": "["
- The "[" shows it is an array of items
- To get Python to cycle through the array of samples we use:
for sample in data['systemUtil']['utilSamples']:
- Then we use the variable sample within the loop.
- The Green Arrows show how to get the the electrical power in Watts
- sample -> energyUtil -> powerUtil -> powerReading -> an array (only one member for Processed stats but three for Aggregated stats) --> the number in Watts
- In Python the syntax is:
- The Yellow Arrows show how to get the the temperatures
- sample -> energyUtil -> thermalUtil -> baseboardTemperatures -> an array (on my system three temperatures are available) --> temperatureReading -> an array (only one member for Processed stats but three for Aggregated stats) --> the number in Celsius
- In Python the syntax is:
- This is reflected in the Python code below.
- The variable "data" contains the JSON formatted stats so we can access the data using named arrays
- If you have used "named arrays" in awk you will already understand this syntax
- This is a little like XML as it comes in many nested section and each has a tag (name) but there are also actual arrays - watch out for those "[" and "]".
- The Raw, Processed and Aggregated stats each has different formats.
- The Processed and Aggregated look the same but Aggregated data comes in threes for Minimum, Maximum and Average.
- Below we are only covering the ProcessedMetrics stats
- Raw & Aggregated file formats are a bit different but the code pretty similar-ish
- Line 325: Checks if we requested ProcessedMetrics - other code block cover Raw and Aggregated stats.
- Lines 327 - 331: Pick out the general information on
- frequency of snapshot,
- server name and
- Lines 332 - 226: If debug == true output these to the screen
- Lines 338 - 339: Open the file used to save comma separated values (CSV) data for processing in to graphs
- Called energy-PROCESSED.csv
- Note ignore the "/" that on the front of the file name was a bug.
- Line 342: The for statement loops through the 240 data utilSamples
- Lines 343 - 346: Sometimes we get samples that only contain error messages.
- Warning: I have had JSON files that returned that only contains errors, particularly after just switching on Energy monitoring with the Preferences Flags.
- After 5 minutes I get 239 errors and 1 good sample.
- I have assumes "status": 0 means no error (currently undocumented).
- I have assumed a status of 1 or 2 (I have seen examples of these) means an error as its not there in the file (currently undocumented).
- Lines 349 - 357: Extract the Watts and eight temperatures.
- On my machines there are 3 temperatures from the baseboard. Normally, Power people call this the System Planar.
- On my machines S822/S824 they have 4 CPU temperatures as they have two filled POWER8 processor sockets and each have two POWER8 CPU chips in a socket.
- Other models will differ. S812/S814 will probably have one POWER8 CPU chip so less stats.
The E850 has four and the E870/E880 have between four and sixteen POWER8 CPU chips.
- Plus 1 Inlet temperature.
I am unclear on the number of inlet temperatures you get with a multi CEC machine like E870/E880.
- Line 358: Extracts the time stamp. This is some sort of ISO standard format.
JSON Date time format:
- These look like this example: "timeStamp": "2017-02-06T17:36:25+0000"
- Typical committee output = "the worst of all worlds"
- With four different delimiters: - T : and +
- YYYY-MONTH-DAY T HOUR:MINUTES:SECONDS + Fractions of a second (I think)
- The ProcessedMetrics we are using are every 5 minutes - see the frequency above = 300 seconds.
- Line 359: This line formats all the data we want to save into a string called buffer
- Note the data time stamp is picked apart using string manipulation
- Strings are array of characters with the first being the zeroth element.
- timeStamp[5:7] picks out the 5th character starting from 0 (normally called the sixth letter) but stops BEFORE the 7th character - so that is just two letters (the 5th and 6th) - this is the Month.
- Lines 360 - 361: If debugging print this on the screen.
- Line 362: Commented out but was useful while testing the program.
- Line 363: Append the new data in the buffer string to the CSV file.
- Line 365: After completing the processing of all the samples in the for loop, the CSV file is closed.
- Like 366: Print to screen the numbers of good samples and error found in that file.
Well I hope you got here and learnt how to access the Watts and Temps.
- I am sure there are loads of typos - send me a list.
- If you have questions, I would be delight to answer them or clarify the commentary.
Graphing the the Comma Separated Values File
This is pretty simple after that have got the data off the HMC and in to a simple one snapshot per line CSV
- The CSV is called energy-PROCESSED.csv This one files holds all the data for all the servers.
- In the script after the csv file is sorted and deduplicated the data for a single machine (here my emerald S824) is called ./energy_P8-S824-emerald.csv
- This data is then reformatted with a little help with awk in to the webpage file called ./energy_P8-S824-emerald.html
- There is one webpage file per machine.
In the webpage there is really four sections
- A call to the Googlechart library that does the heavy lifting to generate the graphs.
- There is the post-amble which finishes the array, sets up graph options and the graph details,
- Finally at the end there is the end of the webpages including drawing the buttons.
This need a little more work to refine the graphs and perhaps draw the graphs and buttons on a single webpage rather then one per server.
To invoke just use the script name: Energy_grapher it is a Korn shell script. Let me know if you use bash and it works OK.
Use cron to automate
On a practical note to build up the data in the .csv file, I recommend calling the PCM_energy.pl program from cron once every 15 minutes.
This will create serious duplication the file but means you don't need to worry if you miss say and hours worth of data as it gets 2 hours worth every time you run the Python program.
Deduplicate the file now and then with:
sort -n energy_P8-S824-emerald.csv | uniq >tempfile
cp tempfile energy_P8-S824-emerald.csv
Again that could be improved, perhaps in the next version, by using requesting a reduce number of samples - this would also take less time.
- - - The End - - -