Netcool connector troubleshooting

If you experience any problems when creating a Netcool connector, use the following troubleshooting steps:

  1. When you create the connector instance, ensure data flow is enabled and that the following fields are added on the ObjectServer: AIOpsAlert ID, AIOpsAlertState, and ScopeID.

  2. If you are unable to get a successful test connection from the IBM Cloud Pak for AIOps UI, use the following stpes:

    1. Check you have a valid entry for the connector in the ObjectServer omni.dat file.
    2. Ensure the port is not already in use.
    3. Check whether there are any firewall rules that maybe blocking the connection.
  3. On successful creation of a Netcool connector, you should see the following objects created on the OpenShift Console:

    • Netcool-conn-pod
    • Netcool PVC (Check that the PVC size default is set to 2Gi.)
  4. Check the connector bridge pods.

    • If you notice connector bridge pods restarting many times, or they are down, check the connector bridge pod logs for exceptions:
      oc logs <connector-bridge-xxx>
      
    • If you see the following message in the connector bridge pods:
      containerStatuses:
          lastState:
            terminated:
              exitCode: 137
              finishedAt: "2024-06-07T02:48:52Z"
              reason: OOMKilled
      

    This is an indication that the load incoming into the system is higher than the connector bridge can manage. The connector fails with the message: out of memory: OOMKilled

  5. If events are not flowing into IBM Cloud Pak for AIOps, ensure the components are running:

    oc get installation -o yaml
    

    This shows all that components that are in Ready state.

    1. Ensure all kafka pods are in running state and not restarting.
    2. Ensure the lifecycle pods are running as expected:
      cp4waiops-eventprocessor-eve-29ee-ep-jobmanager
      cp4waiops-eventprocessor-eve-29ee-ep-taskmanager
      
    3. Ensure the following job is completed:
      cp4waiops-eventprocessor-eve-29ee-ep-taskmanager
      job.batch/aiops-ir-lifecycle-create-policies-job
      job.batch/aiops-ir-lifecycle-policy-registry-svc-job
      job.batch/aiops-ir-lifecycle-policy-upgrade-job
      job.batch/aiops-ir-lifecycle-sync-es-cassandra-cronjob-28654560
      
    4. Ensure all the workloads are running as expected. If any of the workloads are in failure state, you would not see events in IBM Cloud Pak for AIOps.
  6. Custom Mapping.

    1. If you still do not see any events coming in, review your mapping in the connector config.
    2. Test with the default mappings to see if events are flowing in.
    3. Ensure that all custom fields (if integers) are type casted to string value.
  7. Resync the connector.

    1. At the connector UI, disable the collector's data collection.
    2. In the connector pod, delete all files under /bindings/netcool-connector/omnibus/var/G_CONNECTOR.
    3. At the ObjectServer alerts.status table, reset the AIOpsAlertId and AIOpsState columns.
    4. At AIOps, remove all alerts from the ObjectServer.
    5. At the connector UI, enable the collector's data collection. Note: if the columns in the NOI objects server AIOpsAlertId and AIOpsState columns are not reset you may not see events in AIOPS as these mark that the connector has already actioned this events
  8. Clearing Events and resetting Lifecycle state. Reset lifecycle state after running the following command on the must-gather script.

    waiops-mustgather.sh -DR -C clear-lifecycle-state.sh
    

    Note: Clearing the lifecycle state is generally done after recovery of the Issue Resolution Lifecycle operator or clearing events in Cloud Pak for AIOps and resetting lifecycle state.