Troubleshooting
Problem
Primary probes stop receiving heartbeat from the secondary probe. The secondary probe stops logging any P2P messages and will not send events if the primary probe is unavailable. The secondary probe grows in memory.
Symptom
The primary probe works fine however the secondary probe displays unbounded memory growth. Depending on the incoming event rate (and whether the probe is 32-bit or 64-bit) the memory growth can get to a point where it exhausts all available memory on the host, causing the secondary probe to exit.
The primary probe log will show an increasing number of outstanding heartbeats for failures to connect to the secondary probe.
Another symptom is that the TCP port on the secondary probe used for peer-to-peer heartbeat gets stuck in a CLOSE_WAIT or SYNC_REV state.
The secondary probe will also not send events to the ObjectServer(s), even if the primary probe stops working or it loses contact with the primary probe.
The problems usually begins to happen after the secondary probe has been running for a few days (although it can happen at any time).
The key symptom is the absence of P2P messages in the secondary probe log file (they are logged at *INFO* level). These should be logged every 2 seconds (unless a different -beatinterval has been specified).
The primary probe log will show an increasing number of outstanding heartbeats for failures to connect to the secondary probe.
Another symptom is that the TCP port on the secondary probe used for peer-to-peer heartbeat gets stuck in a CLOSE_WAIT or SYNC_REV state.
The secondary probe will also not send events to the ObjectServer(s), even if the primary probe stops working or it loses contact with the primary probe.
The problems usually begins to happen after the secondary probe has been running for a few days (although it can happen at any time).
The key symptom is the absence of P2P messages in the secondary probe log file (they are logged at *INFO* level). These should be logged every 2 seconds (unless a different -beatinterval has been specified).
[{"Product":{"code":"SSSHTQ","label":"Tivoli Netcool\/OMNIbus"},"Business Unit":{"code":"BU048","label":"IBM Software"},"Component":"Not Applicable","Platform":[{"code":"PF002","label":"AIX"},{"code":"PF016","label":"Linux"},{"code":"PF027","label":"Solaris"}],"Version":"7.4.0;8.1.0","Edition":"All Editions","Line of Business":{"code":"LOB77","label":"Automation Platform"}}]
Log InLog in to view more of this document
This document has the abstract of a technical article that is available to authorized users once you have logged on. Please use Log in button above to access the full document. After log in, if you do not have the right authorization for this document, there will be instructions on what to do next.
Was this topic helpful?
Document Information
Modified date:
09 May 2025
UID
swg21683079