IBM Cloud Pak foundational services System Healthcheck service

System healthcheck service is a REST API that provides the status of your nodes, Kubernetes API server, unhealthy pods, and your management services and their dependencies.

Note: This service is a common service that can be shared across IBM Cloud Paks and other IBM products. If you have installed multiple IBM Cloud Paks or other IBM products that support using this service, the service is installed only once on a cluster and is shared by each IBM Cloud Pak or product.

The system healthcheck service provides the health status of your system. View the table for a description of the health status details that are provided with the system healthcheck service:

Table 1. System healthcheck service status description
Status Output description
Your product cluster node
  • Provides cluster node health status details.
  • Provides failure details and pod events for any unhealthy nodes.
Kubernetes API server
  • Provides Kubernetes API server and other management service health status details.
Unhealthy pods
  • Provides failure details and pod events in the kube-system namespace.
Your product cluster management service
  • Provides the health status of all management services.
  • Provides the dependencies of each management service.

Prerequisite: Install your product with Kubernetes. For more information, see the Kubernetes settings section on the Customizing the cluster with the config.yaml file page.

Important: By default, the system healthcheck service is enabled during installation.

Enabling the system healthcheck service after installation from your product console

As a cluster administrator, complete the following steps to enable the system healthcheck service:

  1. Log in to your product console cluster.

  2. Install the system-healthcheck-service chart on to your hub cluster by clicking Administer > Helm releases. Select the system-healthcheck-service chart.

  3. Enter a value for the Helm release name.

  4. Select the kube-system namespace from the Target namespace menu.

  5. Click Configure

  6. Uninstall the system-healthcheck-service:

    1. From the navigation menu, click Manage > Helm Repositories.
    2. Click the Options icon Options icon for the system-healthcheck-service chart.
    3. Click Delete.

Enabling the system healthcheck service for your product from the CLI

Deploy the system healthcheck service on to your hub cluster to enable the system healthcheck service for your product. Complete the following steps to enable the system healthcheck service:

  1. Install the system-healthcheck-service chart. You must add the internal Helm repository. For more information, see Adding the internal Helm repository to Helm CLI. Install the system-healthcheck-service by running the following command:

     helm install mgmt-charts/system-healthcheck-service --name <release-name> --namespace kube-system --tls
    
  2. To uninstall the system-healthcheck-service chart, run the following command:

     helm delete <release-name> --purge --tls
    

Getting the cluster service status

Cluster service status is a CustomResourceDefinition (CRD) for the system healthcheck service. ClusterServiceStatus resources provide the health status, failures, and dependencies of all your management services.

Note: The resource objects are updated every 10 minutes.

Required access: At least an operator role.

  1. Get the health status of your cluster by running the following command:

    kubectl get clusterservicestatus
    

    Your output might resemble the following information:

       NAME                        SERVICE NAME            SERVICE VERSION   STATUS
       cpmcm-audit-logging           audit-logging                             Running
       foundation-auth-apikeys       auth-apikeys                              NotInstalled
       foundation-auth-idp           auth-idp                                  Running
       foundation-auth-pap           auth-pap                                  Running
       foundation-auth-pdp           auth-pdp                                  Running
       foundation-catalog-ui         catalog-ui                                Running
       cpmcm-heapster                heapster                                  NotInstalled
       cpmcm-helm-api                helm-api                                  Running
    
  2. Get a description of any management service status. For example, run the following command to get a description of the cpmcm-key-management service status:

    kubectl describe clusterservicestatus cpmcm-key-management
    

    Your output might resemble the following information:

     Name:         cpmcm-key-management
     Namespace:
     Labels:       app.kubernetes.io/managed-by=system-healthcheck-service
                   clusterhealth.ibm.com/service-name=key-management
     Annotations:  <none>
     API Version:  clusterhealth.ibm.com/v1
     Kind:         ClusterServiceStatus
     Metadata:
       Creation Timestamp:  2019-10-28T20:28:12Z
       Generation:          1
       Resource Version:    5438373
       Self Link:           /apis/clusterhealth.ibm.com/v1/clusterservicestatuses/cpmcm-key-management
       UID:                 76c28061-f9c1-11e9-acb9-00000a150b8d
     Status:
       Current State:  Succeeded
       Pod Failure Status:
         Monitoring - Prometheus - Alertmanager - 5 F 6595 D 54 - 7 Frkq:
           Image:     hyc-cloud-private-edge-docker-local.artifactory.swg-devops.com/ibmcom-amd64/alertmanager:v0.15.0-f
           Image ID:
           Last State:
           Name:           alertmanager
           Ready:          false
           Restart Count:  0
           State:
             Waiting:
               Message:  Back-off pulling image "hyc-cloud-private-edge-docker-local.artifactory.swg-devops.com/ibmcom-amd64/alertmanager:v0.15.0-f"
               Reason:   ImagePullBackOff
           Image:        hyc-cloud-private-edge-docker-local.artifactory.swg-devops.com/ibmcom-amd64/configmap-reload:v0.2.2-f3
           Image ID:
           Last State:
           Name:           configmap-reload
           Ready:          false
           Restart Count:  0
           State:
             Waiting:
               Reason:  PodInitializing
           Image:       hyc-cloud-private-edge-docker-local.artifactory.swg-devops.com/ibmcom-amd64/cpmcm-management-ingress:latest
           Image ID:
           Last State:
           Name:           key-management-onboarding
           Ready:          false
           Restart Count:  0
           State:
             Waiting:
               Reason:  PodInitializing
       Status Dependencies:
         iam
     Events:  <none>
    

Getting the hub cluster service status

  1. Get the health status of your hub cluster by running the following command:

    kubectl get clusterservicestatus
    

    Your output might resemble the following information:

       NAME                        SERVICE NAME            SERVICE VERSION   STATUS
       cpmcm-audit-logging           audit-logging                             Running
       cpmcm-auth-apikeys            auth-apikeys                              NotInstalled
       cpmcm-auth-idp                auth-idp                                  Running
       cpmcm-auth-pap                auth-pap                                  Running
       cpmcm-auth-pdp                auth-pdp                                  Running
       cpmcm-catalog-ui              catalog-ui                                Running
       cpmcm-heapster                heapster                                  NotInstalled
       cpmcm-helm-api                helm-api                                  Running
    
  2. Get a description of any management service status on your hub cluster. For example, run the following command to get a description of the cpmcm-key-management service status:

    kubectl describe clusterservicestatus cpmcm-key-management
    

    Your output might resemble the following information:

     Name:         cpmcm-key-management
     Namespace:
     Labels:       app.kubernetes.io/managed-by=system-healthcheck-service
                   clusterhealth.ibm.com/service-name=key-management
     Annotations:  <none>
     API Version:  clusterhealth.ibm.com/v1
     Kind:         ClusterServiceStatus
     Metadata:
       Creation Timestamp:  2019-10-28T20:28:12Z
       Generation:          1
       Resource Version:    5438373
       Self Link:           /apis/clusterhealth.ibm.com/v1/clusterservicestatuses/cpmcm-key-management
       UID:                 76c28061-f9c1-11e9-acb9-00000a150b8d
     Status:
       Current State:  Succeeded
       Pod Failure Status:
         Monitoring - Prometheus - Alertmanager - 5 F 6595 D 54 - 7 Frkq:
           Image:     hyc-cloud-private-edge-docker-local.artifactory.swg-devops.com/ibmcom-amd64/alertmanager:v0.15.0-f
           Image ID:
           Last State:
           Name:           alertmanager
           Ready:          false
           Restart Count:  0
           State:
             Waiting:
               Message:  Back-off pulling image "hyc-cloud-private-edge-docker-local.artifactory.swg-devops.com/ibmcom-amd64/alertmanager:v0.15.0-f"
               Reason:   ImagePullBackOff
           Image:        hyc-cloud-private-edge-docker-local.artifactory.swg-devops.com/ibmcom-amd64/configmap-reload:v0.2.2-f3
           Image ID:
           Last State:
           Name:           configmap-reload
           Ready:          false
           Restart Count:  0
           State:
             Waiting:
               Reason:  PodInitializing
           Image:       hyc-cloud-private-edge-docker-local.artifactory.swg-devops.com/ibmcom-amd64/cpmcm-management-ingress:latest
           Image ID:
           Last State:
           Name:           key-management-onboarding
           Ready:          false
           Restart Count:  0
           State:
             Terminated:
               Container ID:  docker://b5a15d8cebc6ff9a11accc9d05cbb86a2075610dea678169e471c83bfdc32cec
               Exit Code:     0
               Finished At:   2019-09-30T19:19:34Z
               Reason:        Completed
               Started At:    2019-09-30T19:19:33Z
       Status Dependencies:
         k8s
         mongodb
         cert-manager
     Events:  <none>
    

Getting the managed clusters service status from the hub cluster

When you deploy the system healthcheck service on to your hub cluster, the view-cluster-service-status resource view is created. Complete the following steps to get the resource view:

  1. Get the resource view by running the following command from the command line interface (CLI):

    kubectl get resourceviews -n kube-system
    

    Your output might resemble the following response:

    NAME                          CLUSTER SELECTOR   STATUS      REASON   AGE
    view-cluster-service-status   <none>             Completed            6s
    
  2. Get the view-cluster-service-status resource view to get the cluster service status for all of your managed clusters by running the following command:

    kubectl get resourceviews view-cluster-service-status -n kube-system
    

    Your output might resemble the following response:

    CLUSTER         NAME                        SERVICE NAME            SERVICE VERSION   STATUS
    tony-boy        cpmcm-metrics-server          metrics-server                            Running
    tony-boy        cpmcm-mgmt-repo               mgmt-repo                                 Running
    tony-boy        cpmcm-key-management          key-management                            Pending
    tony-boy        cpmcm-security-onboarding     security-onboarding                       Succeeded
    tony-boy        cpmcm-service-catalog         service-catalog                           Running
    local-cluster   cpmcm-audit-logging           audit-logging                             Pending
    local-cluster   foundation-auth-apikeys       auth-apikeys                              NotInstalled
    local-cluster   foundation-auth-idp           auth-idp                                  Running
    local-cluster   foundation-auth-pap           auth-pap                                  Running
    
  3. You can also view the cluster service status from the console:

    1. Log in to your product cluster.
    2. From the navigation menu, click Search.
    3. Click the search bar. Options to filter your search appear. Click kind.
    4. Continue to filter your query by selecting clusterservicestatus.

System healthcheck service API details

System healthcheck service endpoints are accessible from your cluster URL. Your URL might resemble the following example: https://<CLUSTER_IP>:8443/cluster-health/swagger-ui/.

Note: Your Kubernetes token is required to run the /nodestatus and /clusterstatus endpoints. From the swagger user interface, click Try it out and enter your token value. Your input might resemble the following example, Bearer <kube-token>.

The following endpoints are available:

For more information, see the System healthcheck service API.