Probe and gateway troubleshooting

See the following information to troubleshoot probe and gateway issues.

Probe and gateway continually restart

The probe and gateway may continually restart if the Netcool/OMNIbus ObjectServer is not accessible using the current configuration.

Workaround: Perform one or more of the following remedial steps.
  1. Check that the ObjectServer host and port defined in $ASM_HOME/integrations/omnibus/omni.dat are as expected. If the appropriate omni.dat connection is not for 'AGG_P', update the name in the property files:
    Probe
    $ASM_HOME/integrations/omnibus/kafka/probe/probe.props
    Gateway
    $ASM_HOME/integrations/omnibus/kafka/gateway/G_ASM.props
  2. Check that the ObjectServer port can be reached using nco_ping.sh. For example, for an omni.dat file with a section for AGG_P, use the following:
    $ASM_HOME/bin/nco_ping.sh AGG_P
    If the ping fails
    If the Object Server port cannot be reached, check that the configured port can be accessed using a tool such as telnet, nc or socat. If inaccessible, check for firewalls.
    If omni.dat contains a hostname rather than IP, check that name is resolvable using a tool such as nslookup or getent hosts. If unresolvable, change the DNS settings to make it resolvable, or change the omni,dat file.
    If the ping succeeds
    If the Object Server port can be reached, this does not yet validate any username and password settings.
  3. Check the beginning of the startup.log files in $ASM_HOME/logs/noi-gateway/ and $ASM_HOME/logs/noi-probe/. The startup logs contain the results of the ObjectServer ping and host lookup of all entries in the omni.dat file, for example:
    Successful
    Thu Mar 12 14:11:17 UTC 2020 Checking access to Object Server 'NCOMS'
    NCO_PING: Server available.
    Thu Mar 26 17:22:04 UTC 2020 Checking hostname resolution of 'resolvable.ibm.com'
    1.2.3.4    resolvable.ibm.com
    Unsuccessful
    Thu Mar 12 14:11:15 UTC 2020 Checking access to Object Server 'AGG_P'
    NCO_PING: Server unavailable.
    Thu Mar 26 17:22:04 UTC 2020 Checking hostname resolution of 'unresolvable.ibm.com'
    Thu Mar 26 17:22:04 UTC 2020 {anyMessageHere}
    If the ping and host lookups work from the Agile Service Manager host, but fail inside the container, check how name resolution is configured. Also be aware that the startup log shows credentials being set up in order to access the secured Kafka broker, even if the ObjectServer connection is unsecured.
  4. Check for Error messages in the runtime logs:
    $ASM_HOME/logs/noi-probe/probe.log and $ASM_HOME/logs/noi-gateway/gateway.log
  5. Check for erroneous security credentials. Note that having secure credentials configured when they are not required can cause problems similar to ones caused by omitting them when they are required.
    Username and password issues
    An incorrect password can result in the user being locked out from the ObjectServer.
    Check the Object Server's alerts.login_failures table.
    Note that nco_ping does not validate the username or password.
    TLS issues
    Check the Netcool/OMNIbus documentation to determine whether the probe's SSLServerCommonName and the gateway's Gate.Reader.CommonNames properties are required.

Probe runs, but gateway continually restarts (IDUC reader issue)

If the probe runs but the gateway continually restarts, this may be due to the gateway being unable to connect the IDUC reader thread.

Both probe and gateway connect to the host and port defined in the omni.dat file, but the gateway requires an additional reader connection to receive IDUC updates from the ObjectServer.

Workaround: Perform one or more of the following remedial steps, in order.
  1. Check the gateway log for Error level messages about Iduc: $ASM_HOME/logs/noi-gateway/gateway.log
  2. Check the Object Server properties. There are two key ObjectServer Iduc properties:
    Iduc.ListeningPort
    If it is not configured, the port will vary upon ObjectServer restart, which can cause problems if a firewall is present.
    If it is configured, check that it can be reached from the Agile Service Manager host using a tool such as telnet, nc or socat.
    If inaccessible, check for firewalls.
    Iduc.ListeningHostname
    If configured, this is the name that the gateway reader connection will try to connect to. If the gateway cannot resolve this name, the connection will fail.
    One option to resolve this is to add a dummy entry to $ASM_HOME/integrations/omnibus/omni.dat and restart the probe or gateway.
    [DUMMY]
    {
    Primary: {theNameFromObjectServerPropsFile} 4100
    }
    The startup log will show whether it can be resolved, as described here. If it cannot be resolved, change the value in the object server properties file or make the name resolvable, for example in /etc/hosts on the Agile Service Manager host.

Probe runs, but gateway continually restarts (Kafka topic issue)

If the probe runs but the gateway continually restarts, this may be due to the Kafka topic that is required by the gateway not having been created yet.

Workaround: The topic will be created at startup by the status service (for Agile Service Manager Version 1.1.8 or later), or by the Event Observer (for Agile Service Manager Version 1.1.7 or earlier). The gateway service will continue to restart until the topic becomes available.