Enable the health service monitor to see whether a deployment reached a state where
the gateway is no longer processing data. You can use the service monitor service to get the current
service health status of all other services in the deployment.
About this task
The health service monitor supports reporting on the following statuses:
- The availability of
cassandra
- Whether cloud native analytics gateway ingestion
is running
- Whether the common UI pods are running
- Whether the WebGUI
full stack deployment is running
The health service monitor runs in all setup formats, including hybrid and full stack
deployments.
- In a single-site deployment, the health service monitor starts the
ea-dr-coordinator
service, which is typically unused in this deployment type. If
you use a single-site deployment, you can scale down the deployment of the
ea-dr-coordinator
service without affecting any of the functional features of the
product.
- For a hybrid or geo-redundant high availability disaster recovery (HADR) setup, the
ea-dr-coordinator
service needs to be running. For more information, see Coordinator Service.
Procedure
-
To enable the health service monitor in IBM®
Netcool® Operations Insight®, set the
serviceContinuity.continuousAnalyticsCorrelation
entry to true
in
the noi deployment custom resource (CR). oc edit noi
serviceContinuity:
continuousAnalyticsCorrelation: true
If you are in a hybrid deployment, set the entry
in the noihybrid CR to
true
:
oc edit noihybrid
serviceContinuity:
continuousAnalyticsCorrelation: true
Note: This step is not required on a
geo-redundant cluster as the DR coordinator and health service monitoring is set up in the
backend.
The following route is created, in which the <release>
variable indicates the release version:
<release>-ea-geored-ui-health-service-eageoreduihealthservice
- Get the username and password from the
<release>-systemauth-secret
secret. Run the following
command: oc extract secret/<release>-systemauth-secret
Check
the generated files for the
systemauth_secret_username and
systemauth_secret_password details.
- Get the route name for the health service by running the following
command:
oc get route|grep health
Example
output:
primary-ea-geored-ui-health-service-eageoreduihealthservicdmcgf 1/1 Running 0 5d7h
- Interrogate the health service monitor with the following command:
curl -ks -u systemauth_secret_username:systemauth_secret_password https://<route host name for serviceHealth>/api/serviceHealth/v1/noi/componentsStatusReport
curl -ks -u system:48r1ffkhE1khWVW https://netcool-primary.apps.geor-3-167-101748.xyz.com/api/serviceHealth/v1/noi/componentsStatusReport
In this example, the host name that is associated with this route is the
<route host name for serviceHealth>
variable.
The command returns system health information for the Cassandra, CNEA ingestion, WQebGUI,
common-ui server components.
okay
- health is good
unavailable
- the product is not available. Pod may have crashed or be scaled
down
degraded
- the feature may not work as expected as a component maybe down or
crashed
Example output:
{
"componentsHealth": [
{
"name": "cassandra",
"status": "ok",
"lastUpdateTime": 1671018468
},
{
"name": "cneaIngestion",
"status": "ok",
"lastUpdateTime": 1671036430
},
{
"name": "webgui",
"status": "ok",
"lastUpdateTime": 1671037956
},
{
"name": "ibm-hdm-common-ui-uiserver",
"status": "ok",
"lastUpdateTime": 1671018477
}
]
}
In this example, the health service monitor reports that the services that it is monitoring are
running. It also provides a lastUpdateTime
value in epoch time format for when the
service state was set.