Technical Blog Post
ITM Agent Insights: Agent systems missing from COGNOS based TCR Operational Reports
In the TCR interface, under "Work With Reports" section, the "Resource Section" will contain the list of systems that the selected report is able to be run against under the "Servers" section. If this list is blank, or missing a desired system, this indicates that there is no entry for this system in the ManagedSystem table or the ManagedSystem table is absent in the Tivoli Data Warehouse (TDW) database.
COGNOS based TCR reports run SQL queries against the contents of the TDW database, and rely on entries in the ManagedSystem table which is populated by running TCR for OS Agent Report script / procedure against historical data that has been gathered previously from individual IBM Tivoli Monitoring Agent systems and summarized by the Summarization and Pruning Agent (SY), or by configuring the SY agent to create / maintain the dimension resource tables:
Creating and maintaining the dimension tables
The first step should be to verify the overall monitoring environment:
1) OS platform, hostname, and ITM components installed on the TEPS system.
If this is Windows, provide "kincinfo -d", if Unix/Linux provide "cinfo -i" to show the ITM components and application support levels
2) OS platform, hostname, and ITM components installed on the HUB TEMS (and Remote TEMS if using them).
Again, if they are running on Windows OS, provide "kincinfo -d", if on Unix/Linux provide "cinfo -i"
3) What interface is used for the TEP? Is it the browser TEP client? Or the desktop TEP client?
If running the desktop TEP client (TEPD), is this installed on a different system than the TEPS?
If yes, provide the details from the TEPD system:
OS platform, hostname, and ITM components installed on the TEPD system.
Again, if the TEPD system is UNIX / Linux gather "cinfo -i", if Windows gather "kincinfo -d"
4) OS platform, hostname, and ITM components installed on the system where the Warehouse Proxy Agent (WPA) is installed / running (HD component)
5) OS platform, hostname, and ITM components installed on the system where the Summarization and Pruning Agent (SPA) is installed / running (SY component)
6) OS platform, hostname, ITM components installed on the system where the Tivoli Data Warehouse (TDW) is installed.
Please provide version and database application information.
What type of database is the TDW using?
Is the TDW database on DB2, Oracle, Microsoft SQL?
This is needed to confirm the syntax of commands necessary to verify the list of tables / views and contents of tables for TDW data
7) OS platform, hostname, and ITM components installed on the agent system(s) missing from the list of servers a report can be run against.
If some agent systems are displayed, while others are missing, provide the output from "pdcollect" utility on both a working endpoint and a failing endpoint for comparison.
After verifying the ITM environment, verify the contents of the TDW database to determine if the ManagedSystem table exists, and if it is populated with data using SQL select statements.
select * from MANAGEDSYSTEM
select * from Unix_IP_Address
select * from Unix_IP_Address_DV
select * from Linux_IP_Address
select * from Linux_IP_Address_DV
select * from NT_Computer_Information
select * from NT_Computer_Information_DV
The entries in the MANAGEDSYSTEM table are inserted when running the POPULATE_OSAGENTS stored procedure / script, which relies on historical data for platform specifiic attribute groups for Unix / Linux / Windows OS
agents. The historical data has to be collected, AND has to be configured to use DAILY summarization to create entries in the <attribute_group>_D table.
The POPULATE_OSAGENTS stored procedure relies on data that is contained in the Daily summarized tables, using the following views of the underlying Daily summarized tables:
If there are no values in the MANAGEDSYSTEM table of the TDW database, check the contents of the UNIX_IP_Address_D , Linux_IP_Address_D, and NT_Computer_Information_D tables to see if they
contain daily summarized entries. If the daily summary tables are populated, running the POPULATE_OSAGENTS stored procedure / script should create entries in the MANAGEDSYSTEM table.
If there are no <attribute_group>_D tables, review the Summarization options to confirm that historical data summarization is configured for Daily summarization.
This can be seen in the TEP in the Historical Configuration GUI. Example showing that Daily summarization has NOT been configured for Unix_IP_Address attribute group:
If there are no entries in the <attribute_group>_D tables but the tables exist, check the base tables that the summarized data would be created from to confirm if there are detailed entries for the UNIX_IP_Address, Linux_IP_Address, and NT_Computer_Information tables. If there are no entries under the base attribute group tables, confirm that historical data collection is enabled for the attribute groups, and that the historical collection is distributed and running.
The historical collection configuration is needed to confirm where the Short Term Historical (STH) files for historical data are written (TEMS or TEMA), and to confirm that the historical collection is distributed to the endpoint that is not listed in TCR reports "Servers" list or is missing from the MANAGEDSYSTEM table in TDW.
On the agent endpoint, review the .LG0 file for the confirm the agent is connecting to the TEMS:
KRAREG000 Connecting to <TEMS name>
And that the agent shows the UADVISOR situation for the historical collection is distributed / started:
KRAIRA000 Starting Enterprise situation UADVISOR_* <> for <Attribute Group>. No action is required.
The monitoring agent receives the details for where to find the Warehouse Proxy Agent (HD) from the TEMS it connects to.
The agent endpoint must be able to communicate with the HD component over the network in order for STH data to be able to be exported to the TDW database.
If the historical collection is storing the STH file on the TEMA, the pdcollect from the agent endpoint will provide a dir.info file that can be used to confirm the size and last modification date for the STH files.
To review whether there is a problem with the warehousing of the data from the STH binary file into the TDW database.
The default is to warehouse data every 24 hours, but if there are communication problems or other issues in the environment where the short term historical data being stored on the TEMA is not able to be exported to the TDW, the STH binary files for the attribute groups will continue to grow.
If there is a general problem with communicating to the TDW database (network problem, WPA problem, TDW database is offline, etc) STH binary files will continue growing in size as they can't be pruned after successful export.
General ways to identify whether historical data is being exported:
1) Unable to view historical data over 24 hours old (based on default settings) when trying to review historical data in a TEP workspace view, but ARE able to see the historical data lfrom the last 24 hours as that is queried directly from the STH file.
2) SQL queries run against the TDW tables directly, and do not see insertion timestamps for data for a specific period of time, confirming data was not exported to the TDW during that timeframe.
3) STH files continue to grow in size and never "shrink" even after the "warehouse interval" time has passed.
4) Review of the khdexp.cfg file and never see the timestamp for the last successful export being updated.
5) Review the agent's RAS1 logs for errors reported against sending data to the WPA.
This may require increasing RAS1 logging from default level to add "(UNIT: KHD ALL)" tracing.
This is something done once it is deteremined there is a problem based on other symptoms like STH files growing without limit.
The "khd" tracing is needed to be gathered on both the monitoring agent component, AND the warehouse proxy agent (HD).
The agents don't send data directly to the TDW database, the data is sent to the WPA, which is the product code "HD".
If the agent can communicate to the WPA, but the data is not being warehoused, the diagnosis switches to the HD component and the TDW database itself.
One thing to note is that the issue may start off as a temporary "network" communication issue that is resolved, but because of the size of the STH file, then switches to a different issue related to free-space on the system which prevents the warehousing of the data even once the network issue is resolved.
During the export process, there must be free space equal to 3 times the size of the STH file.
If the agent file system is limited and the STH is large, this can result in the ITM agent being unable to complete the export process.
As a result the STH grows forever [unless stopped by MAXSIZE] or until the file system is full.
The general flow:
The WPA registers with the HUB TEMS when it starts.
The agent connects to the TEMS it is configured to when it starts, and gets the location information from the TEMS for the location of where the WPA is running.
The agent gets the list of historical collections (UADVISOR) from the TEMS that are distributed to that endpoint.
The agent performs historical data collections and historical samples are written to the STH files.
The agent exports data to the TDW through the HD based on warehousing interval and historical samples are inserted into the raw details historical tables for the attribute group.
The SY summarizes historical data from the raw details tables and creates / populates the "Daily" summary table / views.
The POPULATE_OSAGENTS procedure / script is run to query the _DV tables and create entries in the MANAGEDSYSTEM table.
Selecting a report in TCR lists the endpoint as a possible system to run the report against.
There are numerous possible causes for problems exporting historical data to the TDW database.
- Temporary network issues preventing communication between the agent endpoint and the WPA.
- Firewalls blocking communication in one direction and need to set up KDE_GATEWAY to get around firewalls.
- Communication issues between the WPA and the TDW.
- Invalid details during the registration process between the WPA and the HUB TEMS.
- Fail-over environment where the WPA works when the primary HUB TEMS is running, but fail when the stand-by failover TEMS takes over.
- Temporary network issues between the WPA and the TDW database.
- Problems with the TDW database where it is running out of space, or where the TDW database is not running (shut down, crashed, etc).
- Problems where the WPA is not running due to it crashing or having been stopped.
- Corruption / missing khdexp.cfg or STH files
- Attribute collection issues preventing data collectino for historical samples
Reference DCF technotes:
DCF 1569955 - OS Agent systems missing from COGNOS based TCR Operational Reports
Additional ITM Agent Insights series of IBM Tivoli Monitoring Agent blogs are indexed under ITM Agent Insights: Introduction.
Subscribe and follow us for all the latest information directly on your social feeds:
Check out all our other posts and updates: