Troubleshooting IBM Tivoli Netcool/OMNIbus probe hang issue
Use this information to troubleshoot an IBM Tivoli Netcool/OMNIbus probe hang issue that can occur with any probe that uses a TCP (Transmission Control Protocol) connection to send data to the ObjectServer.
Description of the issue
IBM Tivoli Netcool/OMNIbus probes may hang as a result of network configuration issues that break TCP sessions.
When these TCP communication errors occur, the probe log ends at the hang while flushing events to the ObjectServers. For example:
2016-01-29T17:19:23: Debug: D-UNK-000-000: Sending.....
2016-01-29T17:19:23: Debug: D-UNK-000-000: Flushing events to object servers
2016-01-29T17:19:23: Debug: D-UNK-000-000: 1 buffered alerts
The probe log also contains errors associated with network connectivity. For example:
Error: E-UNK-000-000: [ProtocolTDS]: ct_results(): network packet layer:
internal net library error: Net-Lib protocol driver call to read data failed
OS Error: Socket recv failed - errno 110 Connection timed out
Thus, if you see the probe hang as described previously, it is because of a TCP communication error. Fixing the error in your network connection will allow the probe to function normally.
Workaround to the issue
In the previous example, setting the MTU to 1300 bytes on the ObjectServer network interfaces forced the larger events to be broken into smaller packets for transmission over the VPN. This prevented TCP connection time outs that resulted in the probe hang.
Additional tips for troubleshooting
The SQL Interactive Interface (the nco_sql
utility)
can be used to recreate network issues by inserting suspected records
directly into the alerts.status table in the ObjectServer.
This simulates the insert performed by a probe. In the previous example,
the nco_sql
utility was used to determine that large
records that were present in the probe log were not being inserted
successfully into the ObjectServer.
Also, try using Wireshark (an open source packet analyzer) to follow
the events as they are sent across the network from the probe host
to the ObjectServer to
clearly identify network related issues.