Sitworld: Attribute and Catalog Health Survey
John Alvord, IBM Corporation
Draft #3 – 19 April 2015 - Level 0.80000
Recently I worked with a customer that experienced into a rarely seen ITM limit. ITM uses catalog and attribute files to define the data that agents can process from their monitored environments. The TEMS reads the catalog files into a combined catalog table and the attribute files into an in storage attribute collection. These get used in Situations, Historical Data, Real Time data displays and more. This customer had added the 513th catalog file and TEMS failed during startup. Internally .cat files are known as package files and there is an absolute limit of 512 packages. With IBM Support help, the customer removed a few .cat and .atr files, reset the combined catalog file to empty and the TEMS started up just fine.
However this meant the customer was unable to install certain types of maintenance or new applications. There was an urgent need for a reliable way to identify unused catalog and attribute files.
The result is this package which calculates the unused catalog and attribute files. It also produces a health report which tells error cases like an attribute group used in a situation which is missing from any attribute files.
The Attribute files are taken from the hub TEMS environment:
The Catalog files are taken from the hub TEMS environment
The Situation definition is taken either directly from the TEMS database tables TSITDESC and TNAME or indirectly from the Situation Audit project run with the -a option introduced at level 1.25000. Situation Audit data will provide a report that has fewer false advisories. The EIB tables used directly sometimes identify attribute group names incorrectly because they have approximately the correct form. Situation Audit is more precise because it performs a full syntax analysis. In actual usage either will do.
The first step will be removing unused catalog and attribute files. After that the number of advisories messages in the report will be sharply reduced.
The three data sources do not have to be used in place. You can create the data and afterwards copy it to another location for processing. You do not have to achieve perfection although removing the high impact advisories will definitely improve ITM processing reliability. Performance is not expected to change much.
This document uses the default install directory however you can make any wanted.
Linux/Unix systems come with Perl installed. Windows may need it installed and I use http://www.activestate.com/activeperl, community edition 5.20. No CPAN modules are needed for this package. It will likely work on many different levels. As time goes on the project will be upgraded to modern levels about once a year in the late fall.
The package is here. It contains
1) A Perl program atrhealth.pl and control file atrhealth.ini - standing for Attribute and Catalog Health Survey.
2) If you use the Situation Audit capture of the sit_atr.txt file, the following files can be ignored.
3) A Windows atrsql.cmd file to run the SQL statements
4) A Linux/Unix atrsql.tar file that contains the atrsql.sh file. This avoids problems with line endings. To use untar atrsql.tar into the target directory.
5) The cmd and shell files require manual updating if the install directory is not the default.
I suggest these all program objects be placed in a single directory. For Windows you can create the tmp directory and sql subdirectory. For Linux/Unix create the sql directory.
You can run this in any directory, of course.
Configuring the Attribute Health Survey Program - Initialization file
Create the atrhealth.ini file. Here is an example where the sit_atr.txt will be used.
sit_atr: the data supplied is the filename. In this case there is a sub-directory qa1 and the file is in that directory. This is from Windows and so the backslash character is used.
attrlib: the data supplied is a directory where all the attribute files are stored.
rkdscatl: the data supplied is a directory where all the attribute files are stored.
These can be specified as fully qualified file names to use the existing files like this
If the Situation data is supplied by the EIB capture, the atrhealth.ini looks like this [# is a comment character]
The two EIB capture files must be in the current directory and have the name
and they should be identified automatically. If there is any confusion you can invoke atrhealth.pl with the -lst option.
Getting the Situation/Attribute Data
For the Situation Audit case install that package and use it with the -a option.
Following shows how to get the data from the EIB using supplied SQL using the atrsql.cmd or atrsql.sh files. Here is an example where the work is being done in the existing default tmp directory for Linux/Unix where the TEPS is running. If the product is not installed in the default directory. set the environment variable
a) copy atrsql.tar to /opt/IBM/ITM/tmp
b) untar -xf atrsql.tar
c) If not using default install directory configure like this: export CANDLEHOME=/opt/IBM/ITM
d) sh datasql.sh
d) The two files are created and should be moved to where the survey will be done
Here is an example where the work is being done in the existing default tmp directory for Windows where the TEPS is running.
b) cd c:\IBM\ITM
c) md tmp
d) cd tmp
e) move the atrsql.cmd to this directory
f) If not using default install directory configure like this: SET CANDLE_HOME=c:\IBM\ITM
h) The two files are created and should be moved to where the survey will be done
Running the Attribute and Catalog Health Survey
a) Following the preceding step the two files QA1CSITF.DB.LST QA1DNAME.DB.LST are already present in /opt/IBM/ITM/tmp
b) create a file atrhealth.ini like this
c) copy the atrhealth.pl program here and run the program
perl atrhealth.pl -lst
a) Following the preceding step the two files QA1CSITF.DB.LST QA1DNAME.DB.LST are already present in C:\IBM\ITM\TMP
b) create a file atrhealth.ini like this
c) copy the atrhealth.pl program here and run the program
perl atrhealth.pl -lst
The result will be three files:
- atrhealth.csv health survey report
- atrunused.csv list of atr and cat files unused
- atrused.csv list of atr and cat files which are used
Screen shot of Attribute and Catalog Health Survey Report
The beginning of the report contains the version number and a count of the number of messages. That is followed by the advisory messages.
Following is the advisory message documentation.
Advisory code: ATRHEALTH1000E
Text: Attribute group name in sits[$sits] not found in attribute files
Check: For every Attribute Group used in a situation, it should be defined in an attribute file.
Meaning: This is sometimes a false positive when using data directly from the EIB. For example if a Situation Formula contained "12.50" the first three characters might be mis-recognized as an attribute group. This does not occur when situation/attribute data is gotten from Situation Audit.
However if this is not the case, that means the situation will not be processed correctly.
Recovery plan: Install the needed attribute and catalog files and restart the TEMS [needed on all hub/remote TEMSes]. If the situation is no longer needed, delete it. If the situation is not autostarted, it could be ignored.
Advisory code: ATRHEALTH1001E
Text: Catalog key from Attribute table $atable in [$pfns] unknown in catalog files.
Check: For every Attribute Group there should be a related catalog file that defines the application and table name.
Meaning: This strongly suggests the attribute and catalog files are not installed correctly. It could mean that associated situations will not run correctly.
Recovery plan: Review the related attribute file and see what the catalog file should be. If necessary, reinstall the application support.
Advisory code: ATRHEALTH1002W
Text: Attribute group in fn[$pfns] unused in situations
Check: For every Attribute Group used in a situation, check if it is used in a situation.
Meaning: This could mean the attribute group and related catalog file are unused and can be deleted. However it might be an attribute group only used in TEP workspace real time views or where situations will be created in the future.
Recovery plan: Review the attribute files and delete attribute and catalog files if not needed.
Advisory code: ATRHEALTH1003W
Text: Catalog table in fn[$pfns] unused in situations.
Check: For every catalog file determine if the related attributes are used in any situation.
Meaning: This could mean the catalog file and related attribute files are unused and can be deleted. However it might be an attribute group only used in TEP workspace real time views or where situations will be created in the future.
Recovery plan:Review the catalog files and delete attribute and catalog files if not needed.
Advisory code: ATRHEALTH1005W
Text: duplicate Attribute group in files [$pfns]
Check: For every Attribute Group check for duplicates
Meaning: This most often a remnant of Universal Agent or Agent Builder catalog files.
Recovery plan: Delete duplicate attribute files which are unused. This will avoid future problems with too many catalog/attribute files.
Advisory code: ATRHEALTH1006W
Text: duplicate Catalog files in files [$pfns]
Check: For every Catalog file check for duplicates
Meaning:This most often a remnant of Universal Agent or Agent Builder catalog files.
Recovery plan: Delete duplicate catalog files which are unused. This will avoid future problems with too many catalog/attribute files.
Advisory code: ATRHEALTH1007W
Text: Invalid Attribute run_name at line $ll in attribute file $onefn
Check: For every attribute entry check for both attribute group name and attribute name
Meaning: This was spotted in one product provided attribute file [kmc.atr]
Recovery plan: Probably nothing to worry about
The first time you run the report you may see many many advisories. Remember that the higher impact ones are the most important.
Most of the advisories will be related to leftover duplicates. Eliminating them will avoid future problems.
Rerun the report after making corrections. Then work through the Impact 100 Advisories. You do not need to clear up every single issue immediately..
After correcting the hub TEMS, you will need to fix the catalog and attribute files on all the remote TEMS [and FTO backup hub TEMS].
Next Step: Use Portal Client
When you think this process is complete, use the Portal Client to evaluate all the catalogs in the TEMSes. That most easily viewed accomplished in the TEP. From the Enterprise navigation node
1) right click on Enterprise navigation node
2) select Managed Tivoli Enterprise Management Systems
3) In bottom left view, right Click on workspace link [before hub TEMS entry] and select Installed Catalogs
4) In the new display on right, right click in table, select Properties, click Return all rows and OK out
5) Resolve any missing or out of data application data. You can right-click export... the data to a local CSV file for easier tracking.
It is not always required to make things perfect. For example if an agent connects to only some remote TEMSes, then only the hub TEMS and those agents need the catalogs. However cases where the dates are different definitely need correction. In general correction means installing the correct application support.
When you have made all those right repeating the Attribute and Catalog survey one last time will increase confidence in the environment.
This report shows problems Attribute and Catalog files. This will make the ITM environment work more reliably.
History and Earlier versions
If the current version of the Attribute and Catalog Health Survey tool does not work, you can try previous published binary object zip files. At the same time please contact me to resolve the issues. If you discover an issue try intermediate levels to isolate where the problem was introduced.
Handle case where attribute name is missing in attribute file
Improved parse_lst logic. Make data capture cmd/sh files easier for non-default installation directories.