Observer troubleshooting
See the following information to troubleshoot a variety of observer issues.
Observer jobs appear stuck in 'queued' state
Observer jobs can appear to be stuck in state 'Queued' after a Kafka outage, or after enabling Kafka Authentication, as messages between observers and the observer service are lost.
Workaround: As an administrator user, you can either remove or modify the existing job schedule to get the job working again.
Kubernetes Observer job fails to restart after OOM
- Workaround
- Restart the observer if it appears as offline in the UI.
Observation data dropped after retention period (OCP only)
If an observation takes longer than 8 hours it can exceed the default Kafka retention period and the remainder of the data from that observation will be dropped from the resources.json Kafka topic.
- Cause
- The default retention period is 8 hours.
- Workaround
- The default retention time is defined by the environment variable named
KAFKA_RESOURCES_JSON_RETENTION_MS and can be changed in OCP by adding the
following code to the spec section of the custom resource of the operator:
spec: helmValuesASM: global.asm.inputTopicRetentionPeriodMs: 28800000
File Observer fails due to hidden characters
An error can occur when your File Observer input file contains hidden characters that interfere with the processing of the content.
The following lines numbers had problems, check the logs for details: 1
- Cause
- A file may appear compliant in content and format, but may actually be of a UTF-8 Unicode (with BOM) file format rather than a regular UTF-8 file format.
- Workaround
- Change the file format. For example, you can create a new file from the source file using the
following command:
sed '1s/^\xEF\xBB\xBF//' < topology.txt > new.txt
Network Discovery Observer 'unable to start threads' error
An error can occur when you upgrade to the latest version of Agile Service Manager, and then create a new Network Discovery Observer job without increasing the pids_limit defined inside the crio.conf filet o at least to 44406.
{"message":"Tue Dec 6 09:32:06 2022 Warning: Error found in file CRivThread.cc at line 101 - Unable to create a thread.","timestamp":"2022-12-06T09:32:06","level":"trace","log_file":"ncp_agent.SerialLink.NCOMS.trace"}
{"message":"Reason: Resource temporarily unavailable","timestamp":"2022-12-06T09:32:06","level":"trace","log_file":"ncp_agent.SerialLink.NCOMS.trace"}
{"message":"If possible, try reducing the number of threads this process has been configured to use.","timestamp":"2022-12-06T09:32:06","level":"trace","log_file":"ncp_agent.SerialLink.NCOMS.trace"}
...
- Workaround
-
OCP requirement: On the OCP hosts, network discovery requires that pids_limit be set at least to 44406 inside the crio.conf file.
For information about changing the values in the crio.conf file using Machine Configs, see Creating a ContainerRuntimeConfig CR to edit CRI-O parameters
For reference information about Machine Configs, see Red Hat Enterprise Linux CoreOS (RHCOS)
OpenStack Observer certificate chaining error
/opt/ibm/netcool/asm/logs/openstack-observer/openstack-observer.log has following
INFO [2019-11-01 14:48:50,609] [cfd95b7e-3bc7-4006-a4a8-a73a79c71255:OpenStack - Ericsson - ceevepc] c.i.i.t.o.t.ObservationVertex - Backing up observation vertex Ericsson - VEPC
INFO [2019-11-01 14:48:50,617] [cfd95b7e-3bc7-4006-a4a8-a73a79c71255:OpenStack - Ericsson - ceevepc] c.i.i.t.o.t.ObservationVertex - Existing backup observation vertex CTvJ5KIFQgaGNexrlJBsjA for Ericsson - VEPC.bak
INFO [2019-11-01 14:48:50,636] [cfd95b7e-3bc7-4006-a4a8-a73a79c71255:OpenStack - Ericsson - ceevepc/KeystoneV3IdentityTask] c.i.i.t.o.o.j.r.v.t.AbstractTask - cfd95b7e-3bc7-4006-a4a8-a73a79c71255:OpenStack - Ericsson - ceevepc/KeystoneV3IdentityTask - Starting...
INFO [2019-11-01 14:48:50,661] [cfd95b7e-3bc7-4006-a4a8-a73a79c71255:OpenStack - Ericsson - ceevepc] c.i.i.t.o.o.j.r.OpenStackV3FullTopologyGetter - cfd95b7e-3bc7-4006-a4a8-a73a79c71255:OpenStack - Ericsson - ceevepc - cancel - Cancelling Tasks, Shutting Down Executor...
ERROR [2019-11-01 14:48:50,663] [cfd95b7e-3bc7-4006-a4a8-a73a79c71255:OpenStack - Ericsson - ceevepc] c.i.i.t.o.o.j.r.OpenStackV3FullTopologyGetter - cfd95b7e-3bc7-4006-a4a8-a73a79c71255:OpenStack - Ericsson - ceevepc - OpenStack task error occurred, rethrowing...
java.util.concurrent.ExecutionException: com.ibm.itsm.topology.observer.openstack.job.OpenStackTaskProcessingException: An error occurred while processing KeystoneV3IdentityTask:- javax.net.ssl.SSLHandshakeException: com.ibm.jsse2.util.h: PKIX path building failed: java.security.cert.CertPathBuilderException: PKIXCertPathBuilderImpl could not build a valid CertPath.; internal cause is:
java.security.cert.CertPathValidatorException: The certificate issued by CN=IBMSubCA01, DC=IBM, DC=com, DC=Raleigh is not trusted; internal cause is:
java.security.cert.CertPathValidatorException: Certificate chaining error
at java.util.concurrent.FutureTask.report(FutureTask.java:133)
at java.util.concurrent.FutureTask.get(FutureTask.java:203)
at com.ibm.itsm.topology.observer.openstack.job.rest.OpenStackV3FullTopologyGetter.waitForFutures(OpenStackV3FullTopologyGetter.java:155)
at com.ibm.itsm.topology.observer.openstack.job.rest.OpenStackV3FullTopologyGetter.go(OpenStackV3FullTopologyGetter.java:107)
at com.ibm.itsm.topology.observer.openstack.job.rest.FullRESTLoadJob.observe(FullRESTLoadJob.java:85)
at com.ibm.itsm.topology.observer.app.ObservationJob.call(ObservationJob.java:179)
at com.ibm.itsm.topology.observer.app.ObservationJob.call(ObservationJob.java:63)
at com.ibm.itsm.topology.service.utils.InstrumentedVisibleExecutorService.wrapCallable(InstrumentedVisibleExecutorService.java:385)
at com.ibm.itsm.topology.service.utils.InstrumentedVisibleExecutorService.access$400(InstrumentedVisibleExecutorService.java:65)
at com.ibm.itsm.topology.service.utils.InstrumentedVisibleExecutorService$InstrumentedCallable.call(InstrumentedVisibleExecutorService.java:345)
at java.util.concurrent.FutureTask.run(FutureTask.java:277)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1160)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.lang.Thread.run(Thread.java:812)
Caused by: com.ibm.itsm.topology.observer.openstack.job.OpenStackTaskProcessingException: An error occurred while processing KeystoneV3IdentityTask:- javax.net.ssl.SSLHandshakeException: com.ibm.jsse2.util.h: PKIX path building failed: java.security.cert.CertPathBuilderException: PKIXCertPathBuilderImpl could not build a valid CertPath.; internal cause is:
java.security.cert.CertPathValidatorException: The certificate issued by CN=IBMSubCA01, DC=IBM, DC=com, DC=Raleigh is not trusted; internal cause is:
java.security.cert.CertPathValidatorException: Certificate chaining error
at com.ibm.itsm.topology.observer.openstack.job.rest.v3.task.KeystoneV3IdentityTask.process(KeystoneV3IdentityTask.java:43)
at com.ibm.itsm.topology.observer.openstack.job.rest.v2.task.AbstractTask.call(AbstractTask.java:45)
at com.ibm.itsm.topology.observer.openstack.job.rest.v2.task.AbstractTask.call(AbstractTask.java:22)
at java.util.concurrent.FutureTask.run(FutureTask.java:277)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:522)
at java.util.concurrent.FutureTask.run(FutureTask.java:277)
at com.ibm.itsm.topology.service.utils.InstrumentedVisibleExecutorService.wrapRunnable(InstrumentedVisibleExecutorService.java:406)
at com.ibm.itsm.topology.service.utils.InstrumentedVisibleExecutorService.access$200(InstrumentedVisibleExecutorService.java:65)
at com.ibm.itsm.topology.service.utils.InstrumentedVisibleExecutorService$InstrumentedRunnable.run(InstrumentedVisibleExecutorService.java:317)
... 3 common frames omitted
The
problem can occur if not all OpenStack certificates have been loaded into Agile Service Manager, or
the certificate has not been added to the trusted CA list on the Agile Service Manager server.- Workaround
- To load all OpenStack certificates into Agile Service Manager, obtain a copy of the root
certificate(s) from the OpenStack host, and import them into the keystore. Note: Ensure you obtain all certificates, if the host has more than one naming alias. Obtain the certificates directly from the OpenStack administrator or the Server (that is, do not generate them using the openssl command).
File Observer heap size issue
- Workaround
- Increase the maximum Java heap size (Xmx) value to 6G.
Jenkins Observer troubleshooting
- Artifactory integration: script approval
- The first time you use integration with Artifactory, your build may fail as a result of the Artifactory API code being called not yet being whitelisted (approved). In such a case the build log will suggest that you approve the API code.
- Culprits: getting the expected username
- Depending on your build configuration, you may get a 'noreply' as the user in the culprits information reported by Jenkins.
- Git resources URLs
- Topology tools expect that artifact properties contain a valid URL using HTTP rather than SSH.