Enabling the health service monitor

Enable the health service monitor to see whether a deployment reached a state where the gateway is no longer processing data. You can use the service monitor service to get the current service health status of all other services in the deployment.

About this task

The health service monitor supports reporting on the following statuses:

  • The availability of cassandra
  • Whether cloud native analytics gateway ingestion is running
  • Whether the common UI pods are running
  • Whether the WebGUI full stack deployment is running

The health service monitor runs in all setup formats, including hybrid and full stack deployments.

  • In a single-site deployment, the health service monitor starts the ea-dr-coordinator service, which is typically unused in this deployment type. If you use a single-site deployment, you can scale down the deployment of the ea-dr-coordinator service without affecting any of the functional features of the product.
  • For a hybrid or geo-redundant high availability disaster recovery (HADR) setup, the ea-dr-coordinator service needs to be running. For more information, see Coordinator Service.

Procedure

  1. To enable the health service monitor in IBM® Netcool® Operations Insight®, set the serviceContinuity.continuousAnalyticsCorrelation entry to true in the noi deployment custom resource (CR).
    oc edit noi
      serviceContinuity:
        continuousAnalyticsCorrelation: true
    If you are in a hybrid deployment, set the entry in the noihybrid CR to true:
    oc edit noihybrid
      serviceContinuity:
        continuousAnalyticsCorrelation: true
    Note: This step is not required on a geo-redundant cluster as the DR coordinator and health service monitoring is set up in the backend.

    The following route is created, in which the <release> variable indicates the release version:

    <release>-ea-geored-ui-health-service-eageoreduihealthservice
  2. Get the username and password from the <release>-systemauth-secret secret. Run the following command:
    oc extract secret/<release>-systemauth-secret
    Check the generated files for the systemauth_secret_username and systemauth_secret_password details.
  3. Get the route name for the health service by running the following command:
    oc get route|grep health
    Example output:
    primary-ea-geored-ui-health-service-eageoreduihealthservicdmcgf 1/1 Running 0 5d7h
  4. Interrogate the health service monitor with the following command:
    curl -ks -u systemauth_secret_username:systemauth_secret_password https://<route host name for serviceHealth>/api/serviceHealth/v1/noi/componentsStatusReport
    curl -ks -u system:48r1ffkhE1khWVW https://netcool-primary.apps.geor-3-167-101748.xyz.com/api/serviceHealth/v1/noi/componentsStatusReport

    In this example, the host name that is associated with this route is the <route host name for serviceHealth> variable.

    The command returns system health information for the Cassandra, CNEA ingestion, WQebGUI, common-ui server components.
    • okay - health is good
    • unavailable - the product is not available. Pod may have crashed or be scaled down
    • degraded - the feature may not work as expected as a component maybe down or crashed
    Example output:
    {
      "componentsHealth": [
        {
          "name": "cassandra",
          "status": "ok",
          "lastUpdateTime": 1671018468
        },
        {
          "name": "cneaIngestion",
          "status": "ok",
          "lastUpdateTime": 1671036430
        },
        {
          "name": "webgui",
          "status": "ok",
          "lastUpdateTime": 1671037956
        },
        {
          "name": "ibm-hdm-common-ui-uiserver",
          "status": "ok",
          "lastUpdateTime": 1671018477
        }
      ]
    }

    In this example, the health service monitor reports that the services that it is monitoring are running. It also provides a lastUpdateTime value in epoch time format for when the service state was set.