Enabling the health service monitor

Edit online

Enable the health service monitor to see whether a deployment reached a state where the gateway is no longer processing data. You can use the service monitor service to get the current service health status of all other services in the deployment.

About this task

The health service monitor supports reporting on the following statuses:

The availability of cassandra
Whether cloud native analytics gateway ingestion is running
Whether the common UI pods are running
Whether the WebGUI full stack deployment is running

The health service monitor runs in all setup formats, including hybrid and full stack deployments.

In a single-site deployment, the health service monitor starts the ea-dr-coordinator service, which is typically unused in this deployment type. If you use a single-site deployment, you can scale down the deployment of the ea-dr-coordinator service without affecting any of the functional features of the product.
For a hybrid or geo-redundant high availability disaster recovery (HADR) setup, the ea-dr-coordinator service needs to be running. For more information, see Coordinator Service.

Procedure

To enable the health service monitor in IBM® Netcool® Operations Insight®, set the serviceContinuity.continuousAnalyticsCorrelation entry to true in the noi deployment custom resource (CR).
```
oc edit noi
  serviceContinuity:
    continuousAnalyticsCorrelation: true
```
If you are in a hybrid deployment, set the entry in the noihybrid CR to true:
```
oc edit noihybrid
  serviceContinuity:
    continuousAnalyticsCorrelation: true
```
Note: This step is not required on a geo-redundant cluster as the DR coordinator and health service monitoring is set up in the backend.
The following route is created, in which the <release> variable indicates the release version:
```
<release>-ea-geored-ui-health-service-eageoreduihealthservice
```
Get the username and password from the <release>-systemauth-secret secret. Run the following command:
```
oc extract secret/<release>-systemauth-secret
```
Check the generated files for the systemauth_secret_username and systemauth_secret_password details.

Get the route name for the health service by running the following command:

oc get route|grep health

Example output:

primary-ea-geored-ui-health-service-eageoreduihealthservicdmcgf 1/1 Running 0 5d7h

Interrogate the health service monitor with the following command:

curl -ks -u systemauth_secret_username:systemauth_secret_password https://<route host name for serviceHealth>/api/serviceHealth/v1/noi/componentsStatusReport

curl -ks -u system:48r1ffkhE1khWVW https://netcool-primary.apps.geor-3-167-101748.xyz.com/api/serviceHealth/v1/noi/componentsStatusReport

In this example, the host name that is associated with this route is the <route host name for serviceHealth> variable.

The command returns system health information for the Cassandra, CNEA ingestion, WQebGUI, common-ui server components.

okay - health is good
unavailable - the product is not available. Pod may have crashed or be scaled down
degraded - the feature may not work as expected as a component maybe down or crashed

Example output:

{
  "componentsHealth": [
    {
      "name": "cassandra",
      "status": "ok",
      "lastUpdateTime": 1671018468
    },
    {
      "name": "cneaIngestion",
      "status": "ok",
      "lastUpdateTime": 1671036430
    },
    {
      "name": "webgui",
      "status": "ok",
      "lastUpdateTime": 1671037956
    },
    {
      "name": "ibm-hdm-common-ui-uiserver",
      "status": "ok",
      "lastUpdateTime": 1671018477
    }
  ]
}

In this example, the health service monitor reports that the services that it is monitoring are running. It also provides a lastUpdateTime value in epoch time format for when the service state was set.