Sitworld: Table of Contents
John Alvord, IBM Corporation
The first section lists six posts I consider most important.
The second section is all the posts and very short comments.
Top 6 By Importance [My Prejudiced View]
ITM Database Health Checker
It is common to see a TEMS database [often called EIB] which has problems which cause confusion or sometimes lack of monitoring. This project identifies and documents 50+ advisories which will make things better.
Best Practice TEMS Database Backup and Recovery
The most costly support cases are when a customer does not have a proper backup. One memorable case was after a Storage Access Network device lost power and the most recent backup was over a year ago. I talk to people every day where TSM is used to make copies of the TEMS Databases and that almost every time is insufficient. This post was written jointly by a top L3 engineer and myself. If everyone did this the time to recover would drop substantially.
MS_Offline – Myth and Reality
MS_Offline type situations are extremely weighty and cause problems "at a distance". For example a recent case with 9545 agents and 22 MS_Offline situations with 5 minute sampling interval has spawned multiple IBM Support interactions. They all come back to this one issue. When Persist>1 is set, the problems are much worse. The blog photo shows a California Condor [VERY LARGE VULTURE] lurking outside a window. Treat MS_Offline type situations as dangerous creatures and you will reduce your risk of injury and pain.
TEMS Audit Process and Tool
This has been available for 4+ years. It is a perfect way to examine the dynamic impact of workload [Situations, SOAP, real time data requests,etc] on a TEMS. With that knowledge you can make changes to avoid problem conditions. I have one customer who runs this on every TEMS each weekend and if "advisory messages" are present [noted via a non-zero exit code] sends the report to an analyst for review. The rate of emergency IBM Support meetings has dropped to near zero... at least for this area.
ITM Agent Health Survey
This tool provides a view of agents which are online but possibly non-responsive. Cases like this mean that real time data response is slow and partially missing, situations are not running, historical data is not being recorded. These are things everyone should worry about. This identifies the guard dog that doesn't bark.
ITM Situation Audit
This is the most recent project. It performs a static analysis on all distributed situations and produces report of warning messages. It also reports which situations need TEMS filtering [instead of Agent filtering] which is a prime performance killer. Together with TEMS Audit you can really increase efficiency - reducing the cost of monitoring. This also gets early warning for situations with problems. Surprisingly, 50 of 51,000 situations studied actually had syntax errors - like VALUE instead of *VALUE. Anyway - I expect this to be an important tool over time.
Sitworld All Posts - Most recent first
|ITM Database Health Checker||3/24/2015||Check TEMS database for issues|
|Suppressing Situation Events By Time Schedule||3/13/2015||Simple example of Until with timer schedule|
|Alerting on Daylight Savings Time Truants||2/27/2015||Situation alert when time differences|
|Sitworld: Daylight Savings Time Truants||2/20/2015||Report on Daylight Savings Time problems|
|Situation Formula with Calculations||1/28/2015||How to effectively calculate a formula|
|ITM Agent Census Scorecard||11/24/2014||Report avoidable TEMA defects|
|ITM Protocol Usage and Protocol Modifiers||10/21/2014||How to increase SOAP ports and much more|
|Agent Workload Audit||10/08/2014||What is actually happening at Agents|
|Situation Distribution Report||7/11/2014||What Situations are running where|
|CPAN Library for Perl Projects||7/11/2014||Using Perl without changing system|
|ITM Virtual Table Termite Control Project||6/17/2014||Recover from Performance Issue|
|ITM TEMS Health Survey||6/9/2014||Verify TEMS central services are working|
|The Situation That Cried Wolf||6/1/2014||Craft a situation for good practical results|
|Statistics After 50,000 Views||5/19/2014||Summary to date|
|*MIN and *MAX - the Little Column Functions That Couldn’t||5/15/2014||Two broken Column function|
|A Situation By Any Other Name…||4/28/2014||Discovering situation names|
|Do It Yourself TEMS Table Display||4/28/2014||Do It Yourself - Run SQL|
|Running TEMS without SITMON||4/7/2014||Recovery when TEMS very broken|
|ITM Situation Audit||3/20/2014||Compiler or Lint for Situation Formulas|
|SOAP Flash Flood||2/1/2014||tacmd bulkexportsit -d stresses TEMS|
|Sample EIF Listener project||1/17/2014||Do It Yourself Event listener|
|Situation Limits||12/31/2013||Situations have many limits|
|Put Your Situations on a Diet Using Indexed Attribute||12/19/2013||Performance boost for some Situations|
|Sampled Situations and Until Situations||11/25/2013||Until Processing expose|
|TEMS Audit Process and Tool||11/16/2013||Measure Agent stress on TEMS|
|Detector/Recycler for ITM Windows OS Agent||11/2/2013||Windows OS Agent recycler high CPU|
|1997 Kasparov vs. Deep Blue Chess Match||9/17/2013||Virtual Table hub Update hidden issue|
|ITM Agent Health Survey||9/6/2013||Discover unhealthy agents|
|Sampled Situation Blinking Like a Neon Light||9/4/2013||When situation events auto-close|
|Sampling Interval and Time Tests||8/24/2013||Sampled situations and time to event|
|TEMS Audit Advisory Messages||8/13/2013||Included in TEMS Audit Process and Tool|
|Situations Caused Domain Name Server Overload||7/24/2013||Situation generated emails hurt DNS|
|Configuring a Stable SOAP Port||7/16/2013||Best Practice when SOAP is vital|
|Best Practice TEMS Database Backup and Recovery||7/12/2013||If you don't have a backup plan read this|
|Action Command Wars – A New Beginning||7/9/2013||Running lots of action commands|
|Detecting and Recovering from High Agent CPU Usage||7/1/2013||Linux/Unix OS Agent High CPU recover|
|An Efficient Design for Starting a Background Process||6/20/2013||Elegant hack|
|Adding Environmental Data to Action Command Emails||6/12/2013||When attributes are not enough|
|Situation Managing Other Situations||6/5/2013||Situation creates MSL|
|Mixed Up Situations||5/28/2013||Multiple Attribute Situation issues|
|Efficient Situation for Two Missing Processes||5/22/2013||Elegant efficiency solution|
|Getting a Good Nights Sleep||5/15/2013||Creating events to keep operators happy|
|Rational Choices for Situation Sampling Intervals||5/8/2013||Best Practice Interval choices|
|The Derivative Log Pattern||5/1/2013||Two stage situation logic|
|Super Duper Situations||4/28/2013||Understanding _Z_ situations|
|MS_Offline – Myth and Reality||4/17/2013||Everything about MS_Offlines|
|Auditing TEMS for Improved Performance||4/4/2013||Included in TEMS Audit Process and Tool|
|ITM Silver Blaze – Agent Responsiveness Checker||3/28/2013||replace by ITM Agent Health Survey|
|ITM TEMS Stress Tester Experiment||3/20/2013||ITM Analytics experiment|
|Introduction||3/20/2013||Nice to meet you!!|
Wonderful World of Situations Table of Contents.