IBM Cloud Private System Healthcheck service

IBM Cloud Private system healthcheck service is a REST API that provides the status of your nodes, Kubernetes API server, unhealthy pods, and the IBM Cloud Private management services and their dependencies.

The system healthcheck service provides the health status of your IBM Cloud Private system. View the table for a description of the health status details that are provided with the system healthcheck service:

Table 1. System healthcheck service status description
Status Output description
IBM Cloud Private cluster node
  • Provides cluster node health status details.
  • Provides failure details and pod events for any unhealthy nodes.
Kubernetes API server
  • Provides Kubernetes API server and other management service health status details.
Unhealthy pods
  • Provides failure details and pod events in the kube-system namespace.
IBM Cloud Private cluster management service
  • Provides the health status of all IBM Cloud Private management services.
  • Provides the dependencies of each management service.

See IBM Cloud Private components for a description of the components and their dependencies.

Prerequisite: Install IBM Cloud Private with Kubernetes. For more information, see the Kubernetes settings section on the Customizing the cluster with the config.yaml file page.

Important: By default, the system healthcheck service is enabled during installation.

Enabling the system healthcheck service after installation from the management console

As a cluster administrator, complete the following steps to enable the healthcheck service:

  1. Log in to your IBM Cloud Private cluster from the management console.

  2. Install the system-healthcheck-service chart by clicking Catalog. Select the system-healthcheck-service chart.

  3. Enter a value for the Helm release name.

  4. Select the kube-system namespace from the Target namespace menu.

  5. Click Configure

  6. Uninstall the system-healthcheck-service:

    1. From the navigation menu, click Manage > Helm Repositories.
    2. Click the Options icon (Options icon) for the system-healthcheck-service chart.
    3. Click Delete.

Enabling the system healthcheck service after installation from the command line interface (CLI)

Deploy the system healthcheck service on to your IBM Cloud Private cluster to enable the system healthcheck service. Complete the following steps to deploy the system healthcheck service:

  1. Install the system-healthcheck-service chart. You must add the internal Helm repository. For more information, see Adding the internal Helm repository to Helm CLI. Install the system-healthcheck-service by running the following command:

     helm install mgmt-charts/system-healthcheck-service --name <release-name> --namespace kube-system --tls
    
  2. To uninstall the system-healthcheck-service chart, run the following command:

     helm delete <release-name> --purge --tls
    

Enabling the system healthcheck service after installation from the IBM Multicloud Manager management console

As a cluster administrator, complete the following steps to enable the system healthcheck service:

  1. Log in to your IBM Multicloud Manager management console cluster.

  2. Install the system-healthcheck-service chart on to your hub cluster by clicking Catalog. Select the system-healthcheck-service chart.

  3. Enter a value for the Helm release name.

  4. Select the kube-system namespace from the Target namespace menu.

  5. Click Configure

  6. Uninstall the system-healthcheck-service:

    1. From the navigation menu, click Manage > Helm Repositories.
    2. Click the Options icon (Options icon) for the system-healthcheck-service chart.
    3. Click Delete.

Enabling the system healthcheck service for IBM Multicloud Manager from the CLI

Deploy the system healthcheck service on to your hub cluster to enable the system healthcheck service for IBM Multicloud Manager. Complete the following steps to enable the system healthcheck service:

  1. Install the system-healthcheck-service chart. You must add the internal Helm repository. For more information, see Adding the internal Helm repository to Helm CLI. Install the system-healthcheck-service by running the following command:

     helm install mgmt-charts/system-healthcheck-service --name <release-name> --namespace kube-system --tls
    
  2. To uninstall the system-healthcheck-service chart, run the following command:

     helm delete <release-name> --purge --tls
    

Getting the cluster service status

Cluster service status is a CustomResourceDefinition (CRD) for the system healthcheck service. ClusterServiceStatus resources provide the health status, failures, and dependencies of all IBM Cloud Private management services.

Note: The resource objects are updated every 10 minutes.

Required access: At least an operator role.

  1. Get the health status of your cluster by running the following command:

    kubectl get clusterservicestatus
    

    Your output might resemble the following information:

       NAME                        SERVICE NAME            SERVICE VERSION   STATUS
       icp-audit-logging           audit-logging                             Running
       icp-auth-apikeys            auth-apikeys                              NotInstalled
       icp-auth-idp                auth-idp                                  Running
       icp-auth-pap                auth-pap                                  Running
       icp-auth-pdp                auth-pdp                                  Running
       icp-catalog-ui              catalog-ui                                Running
       icp-heapster                heapster                                  NotInstalled
       icp-helm-api                helm-api                                  Running
    
  2. Get a description of any management service status. For example, run the following command to get a description of the icp-monitoring service status:

    kubectl describe clusterservicestatus icp-monitoring
    

    Your output might resemble the following information:

     Name:         icp-monitoring
     Namespace:
     Labels:       app.kubernetes.io/managed-by=icp-system-healthcheck-service
                   clusterhealth.ibm.com/service-name=monitoring
     Annotations:  <none>
     API Version:  clusterhealth.ibm.com/v1
     Kind:         ClusterServiceStatus
     Metadata:
       Creation Timestamp:  2019-07-23T19:46:13Z
       Generation:          2
       Resource Version:    6676898
       Self Link:           /apis/clusterhealth.ibm.com/v1/clusterservicestatuses/icp-monitoring
       UID:                 87136981-ad82-11e9-9e5a-00000a150b81
     Status:
       Current State:  Pending
       Pod Failure Status:
         Monitoring - Prometheus - Alertmanager - 5 F 6595 D 54 - 7 Frkq:
           Image:     hyc-cloud-private-edge-docker-local.artifactory.swg-devops.com/ibmcom-amd64/alertmanager:v0.15.0-f
           Image ID:
           Last State:
           Name:           alertmanager
           Ready:          false
           Restart Count:  0
           State:
             Waiting:
               Message:  Back-off pulling image "hyc-cloud-private-edge-docker-local.artifactory.swg-devops.com/ibmcom-amd64/alertmanager:v0.15.0-f"
               Reason:   ImagePullBackOff
           Image:        hyc-cloud-private-edge-docker-local.artifactory.swg-devops.com/ibmcom-amd64/configmap-reload:v0.2.2-f3
           Image ID:
           Last State:
           Name:           configmap-reload
           Ready:          false
           Restart Count:  0
           State:
             Waiting:
               Reason:  PodInitializing
           Image:       hyc-cloud-private-edge-docker-local.artifactory.swg-devops.com/ibmcom-amd64/icp-management-ingress:latest
           Image ID:
           Last State:
           Name:           router
           Ready:          false
           Restart Count:  0
           State:
             Waiting:
               Reason:  PodInitializing
       Status Dependencies:
         iam
     Events:  <none>
    

Getting the hub cluster service status

  1. Get the health status of your hub cluster by running the following command:

    kubectl get clusterservicestatus
    

    Your output might resemble the following information:

       NAME                        SERVICE NAME            SERVICE VERSION   STATUS
       icp-audit-logging           audit-logging                             Running
       icp-auth-apikeys            auth-apikeys                              NotInstalled
       icp-auth-idp                auth-idp                                  Running
       icp-auth-pap                auth-pap                                  Running
       icp-auth-pdp                auth-pdp                                  Running
       icp-catalog-ui              catalog-ui                                Running
       icp-heapster                heapster                                  NotInstalled
       icp-helm-api                helm-api                                  Running
    
  2. Get a description of any management service status on your hub cluster. For example, run the following command to get a description of the icp-monitoring service status:

    kubectl describe clusterservicestatus icp-monitoring
    

    Your output might resemble the following information:

     Name:         icp-monitoring
     Namespace:
     Labels:       app.kubernetes.io/managed-by=icp-system-healthcheck-service
                   clusterhealth.ibm.com/service-name=monitoring
     Annotations:  <none>
     API Version:  clusterhealth.ibm.com/v1
     Kind:         ClusterServiceStatus
     Metadata:
       Creation Timestamp:  2019-07-23T19:46:13Z
       Generation:          2
       Resource Version:    6676898
       Self Link:           /apis/clusterhealth.ibm.com/v1/clusterservicestatuses/icp-monitoring
       UID:                 87136981-ad82-11e9-9e5a-00000a150b81
     Status:
       Current State:  Pending
       Pod Failure Status:
         Monitoring - Prometheus - Alertmanager - 5 F 6595 D 54 - 7 Frkq:
           Image:     hyc-cloud-private-edge-docker-local.artifactory.swg-devops.com/ibmcom-amd64/alertmanager:v0.15.0-f
           Image ID:
           Last State:
           Name:           alertmanager
           Ready:          false
           Restart Count:  0
           State:
             Waiting:
               Message:  Back-off pulling image "hyc-cloud-private-edge-docker-local.artifactory.swg-devops.com/ibmcom-amd64/alertmanager:v0.15.0-f"
               Reason:   ImagePullBackOff
           Image:        hyc-cloud-private-edge-docker-local.artifactory.swg-devops.com/ibmcom-amd64/configmap-reload:v0.2.2-f3
           Image ID:
           Last State:
           Name:           configmap-reload
           Ready:          false
           Restart Count:  0
           State:
             Waiting:
               Reason:  PodInitializing
           Image:       hyc-cloud-private-edge-docker-local.artifactory.swg-devops.com/ibmcom-amd64/icp-management-ingress:latest
           Image ID:
           Last State:
           Name:           router
           Ready:          false
           Restart Count:  0
           State:
             Waiting:
               Reason:  PodInitializing
       Status Dependencies:
         iam
     Events:  <none>
    

Getting the IBM Multicloud Manager managed clusters service status from the hub cluster

When you deploy the system healthcheck service on to your hub cluster, the view-cluster-service-status resource view is created. Complete the following steps to get the resource view:

  1. Get the resource view by running the following command from the command line interface (CLI):

    kubectl get resourceviews -n kube-system
    

    Your output might resemble the following response:

    NAME                          CLUSTER SELECTOR   STATUS      REASON   AGE
    view-cluster-service-status   <none>             Completed            6s
    
  2. Get the view-cluster-service-status resource view to get the cluster service status for all of your managed clusters by running the following command:

    kubectl get resourceviews view-cluster-service-status -n kube-system
    

    Your output might resemble the following response:

    CLUSTER         NAME                        SERVICE NAME            SERVICE VERSION   STATUS
    tony-boy        icp-metrics-server          metrics-server                            Running
    tony-boy        icp-mgmt-repo               mgmt-repo                                 Running
    tony-boy        icp-monitoring              monitoring                                Pending
    tony-boy        icp-security-onboarding     security-onboarding                       Succeeded
    tony-boy        icp-service-catalog         service-catalog                           Running
    local-cluster   icp-audit-logging           audit-logging                             Pending
    local-cluster   icp-auth-apikeys            auth-apikeys                              NotInstalled
    local-cluster   icp-auth-idp                auth-idp                                  Running
    local-cluster   icp-auth-pap                auth-pap                                  Running
    
  3. You can also view the cluster service status from the console:

    1. Log in to your IBM Multicloud Manager cluster.
    2. From the navigation menu, click Search.
    3. Click the search bar. Options to filter your search appear. Click kind.
    4. Continue to filter your query by selecting clusterservicestatus.

System healthcheck service API details

System healthcheck service endpoints are accessible from your cluster URL. Your URL might resemble the following example: https://<CLUSTER_IP>:8443/cluster-health/swagger-ui/.

Note: Your Kubernetes token is required to run the /nodestatus and /clusterstatus endpoints. From the swagger user interface, click Try it out and enter your token value. Your input might resemble the following example, Bearer <kube-token>.

The following endpoints are available:

See the IBM Cloud Private system healthcheck service API for more information.