Agent troubleshooting

This section lists problems that might occur with agents.

This chapter provides agent-specific troubleshooting information. See the IBM® Tivoli® Monitoring Troubleshooting Guide for general troubleshooting information.

Table 8. Agent problems and solutions
Problem Solution

The monitoring agent stops with the following error:

Unable to allocate 

2147483642
bytes of memory.
Agent is terminating...

Delete the TMAITM6\logs\khdexp.cfg file then restart agent. Observe if your agent still terminates and sends the memory allocation error.

If after deleting the khdexp.cfg file the agent still terminates with the same memory error, remove all the entries in TMAITM6\logs then restart the agent. You can move them to a temporary directory if you do not want to delete them. After removing all entries in TMAITM6\logs and restarting the agent, observe if your agent still sends the memory allocation error.

A configured and running instance of the monitoring agent is not displayed in the Tivoli Enterprise Portal, but other instances of the monitoring agent on the same system do appear in the portal. Tivoli Monitoring products use Remote Procedure Call (RPC) to define and control product behavior. RPC is the mechanism that allows a client process to make a subroutine call (such as GetTimeOfDay or ShutdownServer) to a server process somewhere in the network. Tivoli processes can be configured to use TCP/UDP, TCP/IP, SNA, and SSL as the desired protocol (or delivery mechanism) for RPCs.

"IP.PIPE" is the name given to Tivoli TCP/IP protocol for RPCs. The RPCs are socket-based operations that use TCP/IP ports to form socket addresses. IP.PIPE implements virtual sockets and multiplexes all virtual socket traffic across a single physical TCP/IP port (visible from the netstat command).

A Tivoli process derives the physical port for IP.PIPE communications based on the configured, well-known port for the HUB Tivoli Enterprise Monitoring Server. (This well-known port or BASE_PORT is configured using the 'PORT:' keyword on the KDC_FAMILIES / KDE_TRANSPORT environment variable and defaults to '1918'.)

The physical port allocation method is defined as (BASE_PORT + 4096*N) where N=0 for a Tivoli Enterprise Monitoring Server process and N={1, 2, ..., 15} for a non-Tivoli Enterprise Monitoring Server. Two architectural limits result as a consequence of the physical port allocation method:

  • No more than one Tivoli Enterprise Monitoring Server reporting to a specific Tivoli Enterprise Monitoring Server HUB can be active on a system image.
  • No more that 15 IP.PIPE processes can be active on a single system image.

A single system image can support any number of Tivoli Enterprise Monitoring Server processes (address spaces) provided that each Tivoli Enterprise Monitoring Server on that image reports to a different HUB. By definition, there is one Tivoli Enterprise Monitoring Server HUB per monitoring Enterprise, so this architecture limit has been simplified to one Tivoli Enterprise Monitoring Server per system image.

No more that 15 IP.PIPE processes or address spaces can be active on a single system image. With the first limit expressed above, this second limitation refers specifically to Tivoli Enterprise Monitoring Agent processes: no more that 15 agents per system image.

This limitation can be circumvented (at current maintenance levels, IBM Tivoli Monitoring V6.1 Fix Pack 4 and later) if the Tivoli Enterprise Monitoring Agent process is configured to use EPHEMERAL IP.PIPE. (This is IP.PIPE configured with the 'EPHEMERAL:Y' keyword in the KDC_FAMILIES / KDE_TRANSPORT environment variable). There is no limitation to the number of ephemeral IP.PIPE connections per system image. If ephemeral endpoints are used, the Warehouse Proxy Agent is accessible from the Tivoli Enterprise Monitoring Server associated with the agents using ephemeral connections either by running the Warehouse Proxy Agent on the same computer or by using the Firewall Gateway feature. (The Firewall Gateway feature relays the Warehouse Proxy Agent connection from the Tivoli Enterprise Monitoring Server computer to the Warehouse Proxy Agent computer if the Warehouse Proxy Agent cannot coexist on the same computer.)

The agent goes off-line when collecting the network port attribute due to reverse DNS look-up time-out.

This agent can hang when querying network ports information and the DNS reverse lookup is disabled. Several errors similar to the following will be logged:

(4A7BDB8F.0000-1B90:knt67agt.cpp,243,"TakeSample") gethostbyaddr      
          error <11004> for IP address 
          error <11004> for IP address 
(4A7BDB94.0000-1B90:knt67agt.cpp,243,"TakeSample") gethostbyaddr      
          error <11004> for IP address 
          error <11004> for IP address 

The agent appears to be off-line in the Tivoli Enterprise Portal, and does not report any data for any workspace. The environment variable REVERSE_LOOKUP_ACCEPTED_FAILURES that can be specified in the configuration file allows you to set the number of accepted failures in the reverse lookup. This action can reduce the hang time.

The process application components are available, but the Availability status shows PROCESS_DATA_NOT_ AVAILABLE.

This problem occurs because the PerfProc performance object is disabled. When this condition exists, IBM Tivoli Monitoring cannot collect performance data for this process. Do the following to confirm that this problem exists and resolve it: Choose Run in the Windows Start menu. Type perfmon.exe in the Open field of the Run window. The Performance window is displayed. Click the plus sign (+) in the tool bar located above the right pane. The Add Counters window is displayed. Look for Process in the Performance object pull-down menu. Perform one of the following actions: If you see Process in the pull-down menu, the PerfProc performance object is enabled and the problem is coming from a different source. You might need to contact IBM Software Support. If you do not see Process in the pull-down menu, use the Microsoft utility from the following Web site to enable the PerfProc performance object:

http://blogs.technet.com/mscom/archive/2008/12/18/

the-mystery-of-the-missing-process-performance-
counter-in-perfmon.aspx

The Process performance object becomes visible in the Performance object pull-down menu of the Add Counters windows, and IBM Tivoli Monitoring is able to detect Availability data. Restart the monitoring agent.

The CPU of the Monitoring Agent for Windows OS is high

The PerfProc service is typically the one responsible for high CPU. Others, like TCPIP, might also need to be disabled. Using the exctrlst.exe that you can download from the Microsoft site, you can disable the PerfProc and TCPIP services. Run the exctrlst.exe command to bring up the Extensible Counter List, where all of the counters are listed. You can deselect the Performance Counters Enabled box while highlighting PerfProc. Click Refresh to save the change. The same method can be used to disable the TCPIP counter.

If these two services are stopped, Tivoli Enterprise Portal workspaces or situations based on Process and Network attribute groups will no longer function.

The Long Queue Name is not matched with the row data collected from perfmon. To allow the Long Queue Name to be matched with the row data collected from perfmon (all the remaining attributes for each MSMQ Queue) the first 63 bytes (characters) of the Queue Name must be unique. This is the only way that the queue name can be matched with the additional metrics that come back from perfmon (the source for the remaining attributes of the queue instance).
When you edit the configuration for an existing monitoring agent, the values displayed are not correct. The original configuration settings might include non-ASCII characters. These values were stored incorrectly and result in the incorrect display. Enter new values using only ASCII characters.
Attributes do not allow non-ASCII input in the situation editor. None. Any attribute that does not include "(Unicode)" might support only ASCII characters. For example "Attribute (Unicode)" will support unicode but "Attribute" without "(Unicode)" might only support ASCII characters.
The Windows Agent accesses the root\cimv2 WMI namespace to collect its WMI data. The Security (Access Permissions) for allowing the agent to access these namespaces need to have Enable, Execute Methods, and Provider Write permissions for the Everyone account.
  1. Click Start -> Run.
  2. Type Wmimgmt.msc and click OK.
  3. Right-click Wmi Control and choose properties.
  4. Ensure that it says successfully connected in the General tab and then choose the Security tab.
  5. Select Root folder and then click Security at the bottom of the screen.
  6. Highlight Everyone and then ensure that the 'Enable Account', 'Execute Methods', 'Provider Write' option is Allowed. If it is not, then choose this option.
  7. Highlight Local Service and then ensure that the 'Provider Write' option is Allowed. If it is not, then choose this option.
  8. Click OK.
  9. Reboot the server once.
No performance data is displayed in workspace views, no data is available for situations, and no data is available for historical logging. When the Windows operating system detects a problem in one of its extensible performance monitoring DLL files, it marks the DLL as "disabled." Any DLL that is disabled cannot provide performance data through the Windows Performance Monitor interfaces (Perfmon or Performance Monitor APIs). This prevents IBM Tivoli Monitoring agents from gathering data supplied by the disabled DLL. For more information, see Microsoft Support Knowledge Base article 248993 at the following Web address: http://support.microsoft.com/default.aspx?scid=kb;EN-US;248993

Follow the Resolution instructions provided in this article (248993) to re-enable any performance monitoring extension DLL files disabled by Windows. Then, restart the monitoring agent.

Log data accumulates too rapidly. Check the RAS trace option settings, which are described in Setting RAS trace parameters. The trace options settings that you can set on the KBB_RAS1= and KDC_DEBUG= lines potentially generate large amounts of data.
The system runs out of memory while the agent is collecting data. Ensure that you have installed the newest Service Packs for the Microsoft .NET Framework. Depending on the level of the Windows operating system that you are using, the required Service Pack is as follows:
  • .NET Framework 1.0 SP3

    —OR—

  • .NET Framework 1.1 SP1
Attributes Date Time Last Modified and Date Time Created in the File Change attribute group seem to have their positions switched in the situation editor when specifying a time comparison between these attributes, using the Compare time to a + or - delta function.

This can occur when creating a new situation that uses the attributes of Date Time Last Modified and Date Time Created in the File Change attribute group. If you then select the function Compare time to a + or - delta, it does not show the time attribute that is currently selected.

This is working as designed. 'Compare time to a + or - delta' is to compare the current selected time attribute with other available time attributes with a delta, not with the selected time attribute itself. When you select Date Time Last Modified and Date Time Created, the "Time Attribute for Comparison" shows the available time attributes and it seemed as if the attribute names were switched.