Sitworld: TEMS Audit Process and Tool
John Alvord, IBM Corporation
There have been cases every year where a TEMS was running with high enough CPU/Storage resource usage that the customer was concerned. In some cases, the TEMS experienced a steady storage growth and failure after some days. In all recent cases, this condition has been triggered by workloads or environmental conditions. Situations are the most common workload but SOAP calls, historical data collection and Portal Client workspaces can be involved. In addition there may be important error messages in the diagnostic log.
In 2008 I began work on a year old customer problem that took until Spring 2010 to resolve. The final conclusion was that a certain type of situation caused a severe storage fragmentation and a TEMS failure in 10-12 days. The customer decided to recycle the remote TEMSes once a week. That was probably the single most expensive PMR [for IBM and customer] I ever worked on.
Some 6 months later in Dec 2010 I had a customer with six AIX servers running at 95% utilization and all from remote TEMS processing. I was able to present a solution quickly. However the customer was unconvinced. I wrote a Perl program to summarize the results in a spreadsheet file. The customer was convinced, made the changes and those six systems dropped to 10% utilization. In March 2011 I published the process and tool as a technical note and it is now widely used.
TEMS Audit continues to be enhanced as new issues are encountered. The technical note as documentation became unwieldy so I reworked it into an install guide and a usage guide. All the recent changes have been documented and are included in the zip file below. The advisory messages are of special note since it points to specific issues [Appendix 2 of Usage Guide].
TL;DR - Too Long; Didn't Read
Copy the temsaud.pl and testaud.ini to a convenient location where Perl is installed [like all Unix and Linux environments] and on the system where a TEMS is running. Lets pick /tmp and the TEMS is installed at the default directory. Now run this command
perl temsaud.pl -v -logpath /opt/IBM/ITM/logs
and if there should happen to be more than one candidate inventory file,
perl temsaud.pl -v -logpath /opt/IBM/ITM/log ms_kdsmain.inv
and then view the report file temsaud.csv. If you find anything interesting, start reading the Usage Guide.
On Windows you can install the community edition at www.activestate.com.
Here are recently published versions, In case there is a problem at one level you can always back up.
Restructure for improved standalone running. Add several new advisories low nofiles and concurrent action commands
restructure advisory messages, add advisories for TEMS database errors
Correct some logic on incoming PostEvent messages
Handle a null SQL text continuation better
show more contributors to the resource interval report
Show duplicate agent online better - top 20
Add SimpleHeartbeat report - finds another type of duplicate agent
Add Major Jitter correlation report
Make command capture logic work on z/OS logs
Add result interval report
Add results interval count report
Restore ProcessTable report section
Add PostEvent report section
Add SQL report section and a maximum concurrent action command section
Add action command report section
Correct defect on z/OS log and tracking listen pipes
Add Soap Burst Advisory and SOAP Detail report
Add advisory for ulimit stack more then 10M
ProcessTable Summary, listen pipes. "No Matching Request" error summarized, Nofile advisory, improve -z option processing
1.10000 - last technote version
Advisory section. 16meg truncation warning
Identify and correct workload and configuration problems. I encourage anyone to share success stories, enhancement requests or problems found.
Note: Art Deco Cat sculpture