One of the functions of TEMS (Tivoli Enterprise Monitoring Server) is tracking the online or offline status of monitoring agents. When TEMS shuts down or the monitoring Agent loses connectivity with the monitoring server, the agent stands idle in disconnected mode. Many significant monitoring events could take place in that period which will go unreported, causing loss of monitoring data.
For example, if your agent loses connection with the monitoring server at 2pm and reestablishes the connection after one hour, at 3pm, many important events could have occurred in this span of time, when the agent was in disconnected mode. Events like ‘server down’ or ‘heavy load’ would have gone unnoticed.
Also when the Agent reestablishes the connection with server, it forwards the current state of the situation, which may have changed during the period when Agent was offline.
The current true state of the Monitoring situation could have been set during the uncertainty period when the agent was offline, but it is wrongly shown that the event occurred at the time when the agent reestablished connection with TEMS.
The ability to report the actual state and begin time of a Monitoring situation is important and brings significant value in managing your enterprise. This article discusses how you can enable and control autonomous agent operation so that the critical and decisive data is not lost upon such disconnections.
2. What is Agent Autonomous Mode?
A monitoring agent running on autonomous mode can start and work independently of its monitoring server. This Feature allows the agent to continue collecting event information while disconnected from the Tivoli Enterprise Monitoring Server. Agents are no more in Idle or Offline state when they are disconnected from the monitoring server and they can collect data, run situations, register and forward events. When the agent disconnects from the server due to managing-server shutdown or network communication interruption, it switches active Monitoring situations to autonomous mode. This is the default behavior, which can be adjusted for greater or less autonomy. Thus the agent continues monitoring and collecting data independently.
A set of collected application data, that is, one or multiple rows of application table data, is given to each applicable situation for examination. The agent saves application data that satisfies pure-event situation filter criteria, such as receiving an alarm or a specific message. The agent maintains the sampled-situation state by evaluating application data against defined filter criteria. When the agent successfully reconnects with the Monitoring Server, the agent sends all rows containing saved pure-event situation data to the monitoring server. The agent sends data rows containing sampled situations in true state to the monitoring server. Thus the actual state and begin time of a Monitoring situation is displayed on the Portal, preventing loss of monitoring data.
In Autonomous Mode, Agents work in their usual way; but they won’t be able to carry out any processing that requires the monitoring server.
3. Evolution of Agent Autonomy
Agents based on ITM 6.2.1 framework or before, are called Passive Agents because they don’t support Autonomous Mode feature. While agents based on ITM 6.2.0 were pure passive agents, agents based on ITM 6.2.1 were passive agents with event caching functionality.
Starting ITM 622fp2 framework, all the features and capabilities of Agent Autonomy were introduced, like Event caching, Autonomous operation, Event Persistent, Private Monitoring Situations, Private History, SNMP traps, EIF Emitters, Service Interface Transaction, Central Configuration and Central Configuration Server.
ITCAMMA Agents v7.1.0 supports this feature, where agent can work independent of their Monitoring Server.
4. How to start and configure Autonomous Mode of Agent?
Use the environment file of an agent to configure and control its autonomous behavior (e.g. ‘kpcenv’, where pc stands for the product code). The parameter which you need to set for activation of Autonomous mode is IRA_AUTONOMOUS_MODE, which can be found in <ITMHome>\TMAITM6\kpcenv.
IRA_AUTONOMOUS_MODE=p1, p2, p3
Parameter p1 is a required parameter and its value must always be 1.
Parameter p2 is optional and specifies the maximum number of saved events per pure event situation during autonomous mode operation. Its default value is 512, and can be changed as per the user requirement.
Parameter p3 is optional and specifies the order of saved data within the queue: Y for fixed or N for first in/first out (FIFO). The default is N.
Example: IRA_AUTONOMOUS_MODE = 1,256,N Agent autonomous mode is selected with a FIFO (circular) event data queue size of 256 saved events for all pure-event situations that become true. Note that, N is the default and could have been left off.
Also you can simply activate this mode by setting the parameters as: IRA_AUTONOMOUS_MODE=Y, IRA_AUTONOMOUS_LIMIT=50.
Autonomous mode is by default enabled in agents based on ITM 622+ framework.
Other important parameters, which play role in adjusting the level of autonomy, are:
- CTIRA_RECONNECT_WAIT: This parameter specifies, in seconds, the frequency with which the agent attempts to reestablish its connection with the monitoring server. The default is 600 (10 minutes).
- CTIRA_MAX_RECONNECT_TRIES: This parameter specifies the maximum number of times the agent attempts to reconnect with the monitoring; the agent shuts down when it reaches this maximum. The default is 720.
For example, this setting provides 24 hours of autonomous coverage: CTIRA_RECONNECT_WAIT=60
CTIRA_MAX_RECONNECT_TRIES=1440 -IRA_LOCALCONFIG_DIR: The default local configuration directory path that contains locally customized configuration files such as private situations, EIF event configuration, and SNMP trap configuration files.
-IRA_PRIVATE_SITUATION_CONFIG: Specifies the fully qualified private situation configuration file name.
-CTIRA_THRESHOLDS: Specifies the fully qualified name of the XML-based threshold override file.
-KHD_REGWITHGLB (WPA) & KHD_WAREHOUSE_LOCATION (Agent): Need to add this parameter to the environment file of every enterprise monitoring agent that has full autonomy.
-KSY_AUTONOMOUS & KSY_AUTONOMOUS_ODI_DIR: If you want the Summarization and Pruning agent to have no dependency on the Tivoli Enterprise Portal Server, add KSY_AUTONOMOUS=Y to the summarization and pruning agent environment file and add the location of the agent description files using the KSY_AUTONOMOUS_ODI_DIR variable. You must copy the required ODI files (dockxx) onto the SY machine.
You can check whether Agent Autonomous mode and the related functionalities are working fine, in agent specific log files generated in IBM-ITM\TMAITM6\logs. For example MACHINENAME_PRODUCTCODE. LOG0, MACHINENAME_PRODUCTCODE. LOG1.
5. Autonomous Mode Operation
Autonomous mode is started by default in agents based on ITM 622+ framework. At startup, agents initialize the autonomous mode capabilities. Autonomous mode starts once the connection between agent & TEMS is broken. When TEMS is started, the agent connects to the TEMS and exits the Autonomous Mode. Upon a successful reconnect, the agent registers to the monitoring server, and the agent and the monitoring server exchange the active-situation list. This gives the agent and the Tivoli Enterprise Monitoring Server an opportunity to synchronize defined situations, recognizing that perhaps situation definitions have changed or situations have been deleted.
6. Private Situations
This mode provides functionality called as ‘Private Situations’ where you can locally define custom situations on agent machine to operate fully autonomously. Private situation’s events come directly from the monitoring Agent and you no more require a Monitoring Server to create custom situations.
These situations can be defined in <ITMHome>/localconfig/pc/pc_situations.xml. (Where pc is product code, for example for the ITCAMMA Host Integration Server agent, the xml file will be qh_situations.xml)
<CRITERIA><![CDATA[ *VALUE KQH_AVAILABILITY.Name *EQ SnaBase *AND *VALUE KQH_AVAILABILITY.Status *EQ UP]]></CRITERIA>
<HISTORY TABLE="KQH_AVAILABILITY" interval="1" retain="24"></HISTORY>
Functions like Value of expression (*VALUE) and Check for Missing Items (*MISSING) are the only formula functions available for use in private situations.
Boolean operators that can be used in these situations are: *EQ, *LT, *GT, *NE, *LE, or *GE.
Note the blank before the function (*VALUE in this case) is necessary, otherwise you may get an error like:
"The private situation ‘my_priv_situation’ is rejected. A predicate function is required, but '[*VALUE' is specified."
If you want to name the file differently or use a different path, use the IRA_PRIVATE_SITUATION_CONFIG and IRA_LOCALCONFIG_DIR agent environment variables to change the file name and path.
When Autonomous mode starts, Private situations definitions are validated and then started. When the situation interval expires, condition is evaluated and, in case, a private situation event is generated.
Then private situation events can be sent, as SNMP alerts, to a receiver such as the Netcool/OMNIbus SNMP Probe or to an EIF receiver. Predefined Situations are not affected by this feature & when agent reconnects with the TEMS, predefined situations are evaluated with the help of recorded events.
7. SNMP alerts and EIF events
In Autonomous Mode, agents can not only register events but also can forward situations.
Agents can be configured to send alerts to an SNMP receiver like Netcool/OMNIbus, using the Netcool/OMNIbus SNMP Probe, or Tivoli NetView. Also, Private Situation events can be sent directly from a Tivoli Monitoring Agent to an EIF receiver without going through the Tivoli Enterprise Monitoring Server.
IRA_EVENT_EXPORT_SNMP_TRAP=Y parameter in the agent environment file enables agent SNMP alerts.
IRA_EVENT_EXPORT_EIF=Y parameter is set to enable the EIF event export facility. Change the value to N to disable the facility.
A configuration XML file is required for sending SNMP alerts.
Sample configuration file: C:\IBM\ITM\localconfig\qh\qh_trapcnfg.xml
<TrapDest name=" MyOMNIbus " Address="10.77.68.155" Stat="Y" />
<TrapAttrGroup Table="KQH_AVAILABILITY" TrapAttrList="Name,Status" />
<situation name="*" target="MyOMNIbus" />
If the above configuration file is not found in the mentioned location, SNMP trap configuration definition failed and Agent trap emitter feature gets disabled.
When a situation fires, it creates an event, the SNMP event is then sent to the configured target, which is mentioned in the above XML file.
Similarly, a configuration file for EIF events need to be created, so that when a situation is fired, an EIF event can be sent to the configured EIF receiver.
Sample XML configuration file for EIF events configuration
<Destination id="1" type="T">
<Server location="10.0.112.6" port="5529"/>
If you use EIF receiver, so you can see a socket correctly established and so an event successfully sent.
You also get a feature called Private history, where in autonomous mode we can have collection and short-term storage of data from a local monitoring agent. Define historical collection in a private situation configuration file for an agent, For example:
<HISTORY TABLE="KQH_AVAILABILITY" interval="1" retain="24"></HISTORY>
And then you can use the Agent Service Interface to view the short-term history.
The table name for an attribute group is prefixed with PVTHIST. As part of the private history configuration, you can set the RETAIN attribute to manage the history file size (in hours). The agent outputs all private history files to this subdirectory:
<ITMHome>\TMAITM6\logs (Win) or <ITMHome>/<arch>/pc/hist (Linux/Unix)
9. Agent Service Interface (ASI)
Agent Service Interface removes the requirement of a portal to view the data collected during Autonomous Mode. ASI is provided to access agent’s data even if TEMS connection is not available and agent is working autonomously. We can open it using IBM Tivoli Monitoring Service Index (e.g. http://<tepshostname>:1920), & then Select ASI from the provided options.
We can view the following items in ASI:
-Agent information & environment variables.
-Situations and Events Statistic (private situations)
Check the following points when you face some problem in autonomous mode, one of the following could have caused the issue:
-No situations are defined (or not the desired ones)
-The criteria specified for triggering a situation has not been met
-The duration and number of attempts set for Autonomous mode is wrong
-The situations defined use functions that are not supported by the private situations
-The DISTRIBUTION tag is not correct
-The trapcnfg/eifcnfg file is not provided
-Parameters required for SNMP & EIF are not set or are disabled.
-The destination(s) specified in the trapcnfg/eifcnfg are wrong.