IBM Cloud Private System Healthcheck service

IBM Cloud Private system healthcheck service is a REST API that provides the status of your nodes, Kubernetes API server, unhealthy pods, and the IBM Cloud Private management services and their dependencies.

The system healthcheck service provides the health status of your IBM Cloud Private system. View the table for a description of the health status details that are provided with the system healthcheck service:

Table 1. System healthcheck service status description
Status	Output description
IBM Cloud Private cluster node	Provides cluster node health status details. Provides failure details and pod events for any unhealthy nodes.
Kubernetes API server	Provides Kubernetes API server and other management service health status details.
Unhealthy pods	Provides failure details and pod events in the `kube-system` namespace.
IBM Cloud Private cluster management service	Provides the health status of all IBM Cloud Private management services. Provides the dependencies of each management service.

See IBM Cloud Private components for a description of the components and their dependencies.

Prerequisite: Install IBM Cloud Private with Kubernetes. For more information, see the Kubernetes settings section on the Customizing the cluster with the config.yaml file page.

Important: By default, the system healthcheck service is enabled during installation.

Enabling the system healthcheck service after installation from the management console

As a cluster administrator, complete the following steps to enable the healthcheck service:

Log in to your IBM Cloud Private cluster from the management console.
Install the system-healthcheck-service chart by clicking Catalog. Select the system-healthcheck-service chart.
Enter a value for the Helm release name.
Select the kube-system namespace from the Target namespace menu.
Click Configure
Uninstall the system-healthcheck-service:
1. From the navigation menu, click Manage > Helm Repositories.
2. Click the Options icon () for the system-healthcheck-service chart.
3. Click Delete.

Enabling the system healthcheck service after installation from the command line interface (CLI)

Deploy the system healthcheck service on to your IBM Cloud Private cluster to enable the system healthcheck service. Complete the following steps to deploy the system healthcheck service:

Install the system-healthcheck-service chart. You must add the internal Helm repository. For more information, see Adding the internal Helm repository to Helm CLI. Install the system-healthcheck-service by running the following command:
```
 helm install mgmt-charts/system-healthcheck-service --name <release-name> --namespace kube-system --tls
```
To uninstall the system-healthcheck-service chart, run the following command:
```
 helm delete <release-name> --purge --tls
```

Enabling the system healthcheck service after installation from the IBM Multicloud Manager management console

As a cluster administrator, complete the following steps to enable the system healthcheck service:

Log in to your IBM Multicloud Manager management console cluster.
Install the system-healthcheck-service chart on to your hub cluster by clicking Catalog. Select the system-healthcheck-service chart.
Enter a value for the Helm release name.
Select the kube-system namespace from the Target namespace menu.
Click Configure
Uninstall the system-healthcheck-service:
1. From the navigation menu, click Manage > Helm Repositories.
2. Click the Options icon () for the system-healthcheck-service chart.
3. Click Delete.

Enabling the system healthcheck service for IBM Multicloud Manager from the CLI

Deploy the system healthcheck service on to your hub cluster to enable the system healthcheck service for IBM Multicloud Manager. Complete the following steps to enable the system healthcheck service:

Install the system-healthcheck-service chart. You must add the internal Helm repository. For more information, see Adding the internal Helm repository to Helm CLI. Install the system-healthcheck-service by running the following command:
```
 helm install mgmt-charts/system-healthcheck-service --name <release-name> --namespace kube-system --tls
```
To uninstall the system-healthcheck-service chart, run the following command:
```
 helm delete <release-name> --purge --tls
```

Getting the cluster service status

Cluster service status is a CustomResourceDefinition (CRD) for the system healthcheck service. ClusterServiceStatus resources provide the health status, failures, and dependencies of all IBM Cloud Private management services.

Note: The resource objects are updated every 10 minutes.

Required access: At least an operator role.

Get the health status of your cluster by running the following command:

kubectl get clusterservicestatus

Your output might resemble the following information:

   NAME                        SERVICE NAME            SERVICE VERSION   STATUS
   icp-audit-logging           audit-logging                             Running
   icp-auth-apikeys            auth-apikeys                              NotInstalled
   icp-auth-idp                auth-idp                                  Running
   icp-auth-pap                auth-pap                                  Running
   icp-auth-pdp                auth-pdp                                  Running
   icp-catalog-ui              catalog-ui                                Running
   icp-heapster                heapster                                  NotInstalled
   icp-helm-api                helm-api                                  Running

Get a description of any management service status. For example, run the following command to get a description of the icp-monitoring service status:

kubectl describe clusterservicestatus icp-monitoring

Your output might resemble the following information:

 Name:         icp-monitoring
 Namespace:
 Labels:       app.kubernetes.io/managed-by=icp-system-healthcheck-service
               clusterhealth.ibm.com/service-name=monitoring
 Annotations:  <none>
 API Version:  clusterhealth.ibm.com/v1
 Kind:         ClusterServiceStatus
 Metadata:
   Creation Timestamp:  2019-07-23T19:46:13Z
   Generation:          2
   Resource Version:    6676898
   Self Link:           /apis/clusterhealth.ibm.com/v1/clusterservicestatuses/icp-monitoring
   UID:                 87136981-ad82-11e9-9e5a-00000a150b81
 Status:
   Current State:  Pending
   Pod Failure Status:
     Monitoring - Prometheus - Alertmanager - 5 F 6595 D 54 - 7 Frkq:
       Image:     hyc-cloud-private-edge-docker-local.artifactory.swg-devops.com/ibmcom-amd64/alertmanager:v0.15.0-f
       Image ID:
       Last State:
       Name:           alertmanager
       Ready:          false
       Restart Count:  0
       State:
         Waiting:
           Message:  Back-off pulling image "hyc-cloud-private-edge-docker-local.artifactory.swg-devops.com/ibmcom-amd64/alertmanager:v0.15.0-f"
           Reason:   ImagePullBackOff
       Image:        hyc-cloud-private-edge-docker-local.artifactory.swg-devops.com/ibmcom-amd64/configmap-reload:v0.2.2-f3
       Image ID:
       Last State:
       Name:           configmap-reload
       Ready:          false
       Restart Count:  0
       State:
         Waiting:
           Reason:  PodInitializing
       Image:       hyc-cloud-private-edge-docker-local.artifactory.swg-devops.com/ibmcom-amd64/icp-management-ingress:latest
       Image ID:
       Last State:
       Name:           router
       Ready:          false
       Restart Count:  0
       State:
         Waiting:
           Reason:  PodInitializing
   Status Dependencies:
     iam
 Events:  <none>

Getting the hub cluster service status

Get the health status of your hub cluster by running the following command:

kubectl get clusterservicestatus

Your output might resemble the following information:

   NAME                        SERVICE NAME            SERVICE VERSION   STATUS
   icp-audit-logging           audit-logging                             Running
   icp-auth-apikeys            auth-apikeys                              NotInstalled
   icp-auth-idp                auth-idp                                  Running
   icp-auth-pap                auth-pap                                  Running
   icp-auth-pdp                auth-pdp                                  Running
   icp-catalog-ui              catalog-ui                                Running
   icp-heapster                heapster                                  NotInstalled
   icp-helm-api                helm-api                                  Running

Get a description of any management service status on your hub cluster. For example, run the following command to get a description of the icp-monitoring service status:

kubectl describe clusterservicestatus icp-monitoring

Your output might resemble the following information:

 Name:         icp-monitoring
 Namespace:
 Labels:       app.kubernetes.io/managed-by=icp-system-healthcheck-service
               clusterhealth.ibm.com/service-name=monitoring
 Annotations:  <none>
 API Version:  clusterhealth.ibm.com/v1
 Kind:         ClusterServiceStatus
 Metadata:
   Creation Timestamp:  2019-07-23T19:46:13Z
   Generation:          2
   Resource Version:    6676898
   Self Link:           /apis/clusterhealth.ibm.com/v1/clusterservicestatuses/icp-monitoring
   UID:                 87136981-ad82-11e9-9e5a-00000a150b81
 Status:
   Current State:  Pending
   Pod Failure Status:
     Monitoring - Prometheus - Alertmanager - 5 F 6595 D 54 - 7 Frkq:
       Image:     hyc-cloud-private-edge-docker-local.artifactory.swg-devops.com/ibmcom-amd64/alertmanager:v0.15.0-f
       Image ID:
       Last State:
       Name:           alertmanager
       Ready:          false
       Restart Count:  0
       State:
         Waiting:
           Message:  Back-off pulling image "hyc-cloud-private-edge-docker-local.artifactory.swg-devops.com/ibmcom-amd64/alertmanager:v0.15.0-f"
           Reason:   ImagePullBackOff
       Image:        hyc-cloud-private-edge-docker-local.artifactory.swg-devops.com/ibmcom-amd64/configmap-reload:v0.2.2-f3
       Image ID:
       Last State:
       Name:           configmap-reload
       Ready:          false
       Restart Count:  0
       State:
         Waiting:
           Reason:  PodInitializing
       Image:       hyc-cloud-private-edge-docker-local.artifactory.swg-devops.com/ibmcom-amd64/icp-management-ingress:latest
       Image ID:
       Last State:
       Name:           router
       Ready:          false
       Restart Count:  0
       State:
         Waiting:
           Reason:  PodInitializing
   Status Dependencies:
     iam
 Events:  <none>

Getting the IBM Multicloud Manager managed clusters service status from the hub cluster

When you deploy the system healthcheck service on to your hub cluster, the view-cluster-service-status resource view is created. Complete the following steps to get the resource view:

Get the resource view by running the following command from the command line interface (CLI):

kubectl get resourceviews -n kube-system

Your output might resemble the following response:

NAME                          CLUSTER SELECTOR   STATUS      REASON   AGE
view-cluster-service-status   <none>             Completed            6s

Get the view-cluster-service-status resource view to get the cluster service status for all of your managed clusters by running the following command:

kubectl get resourceviews view-cluster-service-status -n kube-system

Your output might resemble the following response:

CLUSTER         NAME                        SERVICE NAME            SERVICE VERSION   STATUS
tony-boy        icp-metrics-server          metrics-server                            Running
tony-boy        icp-mgmt-repo               mgmt-repo                                 Running
tony-boy        icp-monitoring              monitoring                                Pending
tony-boy        icp-security-onboarding     security-onboarding                       Succeeded
tony-boy        icp-service-catalog         service-catalog                           Running
local-cluster   icp-audit-logging           audit-logging                             Pending
local-cluster   icp-auth-apikeys            auth-apikeys                              NotInstalled
local-cluster   icp-auth-idp                auth-idp                                  Running
local-cluster   icp-auth-pap                auth-pap                                  Running

You can also view the cluster service status from the console:
1. Log in to your IBM Multicloud Manager cluster.
2. From the navigation menu, click Search.
3. Click the search bar. Options to filter your search appear. Click kind.
4. Continue to filter your query by selecting clusterservicestatus.

System healthcheck service API details

System healthcheck service endpoints are accessible from your cluster URL. Your URL might resemble the following example: https://<CLUSTER_IP>:8443/cluster-health/swagger-ui/.

Note: Your Kubernetes token is required to run the /nodestatus and /clusterstatus endpoints. From the swagger user interface, click Try it out and enter your token value. Your input might resemble the following example, Bearer <kube-token>.

The following endpoints are available:

/health
/nodestatus
/clusterstatus

See the IBM Cloud Private system healthcheck service API for more information.