Sitworld: AOA Critical Issue - TEMS Database File Damage
John Alvord, IBM Corporation
In August 2014, the Database Health Checker began running at IBM ECUREP as an Analysis On Arrival task on each incoming hub and remote TEMS pdcollect. Since then TEMS Audit and Event History Audit reports have been added. The reports are very useful for by identifying known error condition and thus speeding ITM diagnosis of issues. Each of the tools can be run by any customer, but the AOA reports are not immediately visible. Any customer could ask for them but not being visible no one ever asks. At the same time the reports have become more complex and challenging to digest.
With a recent change, the process has been extended to create a short list of critical issues which will automatically be added to the S/F Case or PMR as a short email text. That creates visibility for critical issues. This document the issue where there is evidence of TEMS database file damage.
Please note that the conditions identified may not be the issue the problem case was opened for. For example one recent case was a FTO hub TEMS switch to backup that was unexpected. After close study, the major issues was mal-configured agents including duplicate name cases, Virtual Hub Table Update floods and several other items. There are also rare cases where a report will be produced concerning an obsolete TEMS that is definitely installed but not in action use. In that case the report could be ignored - although uninstalling the TEMS would be a good idea.
Getting more information
If you are viewing this document as an customer working with IBM Support, you are welcome to request copies of the Analysis On Arrival reports if they are available. Be sure to mention the unpack directory from the AOA Critical Issue report.
TEMS Audit - temsaud.csv [any hub or remote TEMS]
Database Health Checker - datahealth.csv [any hub TEMS]
Event History Audit - eventaud.csv [any hub or remote TEMS]
There are cases when no report is generated. Sometimes that means there were no advisories. TEMS Audit is not produced when the relevant log files cannot be identified. Database Health checker is run but skipped if it appears to be a remote TEMS. Event History Audit and Database Health Checker are not run if there are errors detected in the table extract process.
Visit the links above to access the AOA programs if you want to run the AOA programs at your own schedule.
TEMS Database Files with errors
One type of error comes from the AOA interface programs. These convert the TEMS database files from the .DB format into text files.
itm_ref_checker.crit: QA1CSTSH.DB:unexpected size difference at tems2sql.pl line 1066.
It is also seen from itm_tems_eventaud.crit. The itm_ref_checker checks more files. Not all files are checked in the prepare stage.
Different errors are seen from TEMS Audit. There could be additional errors which may be added later.
temsaud.crit:TEMS database table $f with $etct Open Index errors
temsaud.crit:TEMS database table $f with $etct Verify Index errors
temsaud.crit:TEMS database table $f with $stct RelRec errors
If this occurs with a hub TEMS database file, you must proceed very carefully and only with IBM Support help. There are certain files or pairs of files that can be replaced. However many of the hub TEMS database files contain critical information such as situation definitions. If those are reset, that data could take weeks to recover manually and no one wants that. While we are on that subject please should read Sitworld: Best Practice TEMS Database Backup and Recovery and implement a proper TEMS database backup plan.
The TEMS database file must be corrected. For cases involving remote TEMSes the answer is simple: just replace the TEMS database files with emptytable files. They are not all empty but they are in the same state as during a new TEMS install. This post Sitworld: TEMS Database Repair contains pretty much all you need to know including links to files containing the emptytable files for Unix/Linux/Windows. We usually suggest replacing all the files on a remote TEMS since errors may be present but not diagnosed through this report.
Do exactly the same If you have a problem with a FTO Mirror hub TEMS and you have confidence in the existing FTO Primary hub TEMS.
We rarely know exactly why the damage happened. A system power off while the TEMS is running was one case. Another was a manual copy of the index file from one system to another - not copying the data file - which did not work well at all. Another was a restoration of all files from an on-the-fly TSM backup. In any event, having a good backup always helps matters.
Sample Recovery Action Plan Template for TEMS Database Files - Remote TEMS
Here are instructions for REMOTE_ibm *REMOTE to reset the TEMS database files to emptytable status.
The instructions could be duplicated on any remote TEMS. [NEVER on the hub TEMS!!!]
This is concerning the remote TEMS REMOTE_ibm that keeps experiencing problems.
There is evidence that the remote TEMS has a broken database file.
The idea is to refresh all the remote TEMS database files and let them be rebuild
naturally from the hub TEMS as in a new install. Here are the instructions:
0) Here is how to access the needed file
You would get them from the TEMS Database Repair post links. The following assumes use of
ITM630_emptytables.bigendian.tar which is used for AIX/HP-UX/Solaris/Linux on Z platforms
at the ITM 630 level
1) copy that file ITM630_emptytables.bigendian.tar [in binary] to the remote TEMS system REMOTE_ibm
I suggest /opt/IBM/ITM/tmp
2) un-tar that file
tar -xf ITM630_emptytables.bigendian.tar
This will create empty QA1* files. They are not entirely empty, but
they are in the same state as they would be during a new install. We
are going to use all files and it would be perhaps useful to save
them for the future. In general you should not use these except with
advice and instruction from IBM Support.
3) Change the empty table file attributes so they are identical to the
current ones which you can verify this way:
ls -l /opt/IBM/TEMS/tables/REMOTE_ibm/QA1CSTSH.DB
I think I see
-rwxrwxrwx 1 root system 35274789 Sep 24 08:08 /opt/IBM/TEMS/tables/REMOTE_ibm/QA1CSTSH.DB
and I think the following will do the work - but please verify
chmod 777 QA1*.*
chown root QA1*.*
chgrp system QA1*.*
4) Stop the remote TEMS when convenient.
5) Copy the emptytable files to the tables directory
cp /opt/IBM/ITM/tmp/QA1*.* .
6) Start the remote TEMS
7) Monitor for stability and normal operations - for example remote TEMS staying online.
The information in the report explains how to manage a AOA Critical Issue concerning TEMS database files which are damaged.
Note: 2018 - Home Grown Meyer Lemons