Checking Kubernetes node status

Use these commands to check the status of the nodes in the environment.

kubectl.sh get pods

The kubectl command is used to show the detailed status of the Kubernetes pods deployed to run the PowerAI Vision application.

Example output
[root@aprilmin8 bin]# /opt/powerai-vision/bin/kubectl get pods
NAME                                                              READY     STATUS    RESTARTS   AGE
powerai-vision-infer-ic-f873e5ff-821b-472a-bec0-32b1cdc4b59ns7j   1/1       Running   0          14d
powerai-vision-infer-od-fae2aa93-e394-4a04-bec7-f072b81f2678b2m   1/1       Running   0          14d
powerai-vision-keycloak-987f9698d-kdvp2                           1/1       Running   1          14d
powerai-vision-mongodb-686ffbd4b9-cq2mx                           1/1       Running   0          14d
powerai-vision-portal-768c4ffdc7-9ppnt                            1/1       Running   0          14d
powerai-vision-postgres-7c88467b4c-swz8m                          1/1       Running   0          14d
powerai-vision-taskanaly-788bdcc5cc-dc5cp                         1/1       Running   0          14d
powerai-vision-ui-844bb5f7f9-t55bg                                1/1       Running   0          14d
powerai-vision-video-nginx-845544c55b-w8ks6                       1/1       Running   0          14d
powerai-vision-video-portal-7499b577c7-csvs8                      1/1       Running   0          14d
powerai-vision-video-rabmq-85b486bc9d-nqdmn                       1/1       Running   0          14d
powerai-vision-video-redis-67d9766fb6-gw7hd                       1/1       Running   0          14d
powerai-vision-video-test-nginx-b5cb6b759-fkwp5                   1/1       Running   0          14d
powerai-vision-video-test-portal-756f7f9b99-mtwwj                 1/1       Running   0          14d
powerai-vision-video-test-rabmq-578c775d4-chbtd                   1/1       Running   0          14d
powerai-vision-video-test-redis-779f7545b4-dffws                  1/1       Running   0          14d

Interpreting the output

  • When the application is running correctly, each of the pods should have:
    • A value of 1/1 in the READY column
    • A value of Running in the STATUS column
  • In the above example output, pods with infer in the name are created when a model is deployed. These will only appear if there are models deployed in the instance of the application running on the system.
  • A STATUS value other than Running indicates an issue with the pod.
  • A non-0 and increasing value in the RESTARTS column indicates an issue with that pod.
If there are indications of issues with pods, see Troubleshooting common issues.

kubectl describe nodes command

The kubectl describe nodes command provides status information regarding the Kubernetes environment used to run the PowerAI Vision application.

Example output
# /opt/powerai-vision/bin/kubectl.sh describe nodes
Name:               127.0.0.1
Roles:              <none>
Labels:             beta.kubernetes.io/arch=ppc64le
                    beta.kubernetes.io/os=linux
                    gpu/nvidia=TeslaP100-SXM2-16GB
                    kubernetes.io/hostname=127.0.0.1
Annotations:        node.alpha.kubernetes.io/ttl=0
                    volumes.kubernetes.io/controller-managed-attach-detach=true
Taints:             <none>
CreationTimestamp:  Tue, 14 Aug 2018 00:54:26 -0500
Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  OutOfDisk        False   Wed, 12 Sep 2018 18:26:03 -0500   Tue, 14 Aug 2018 00:54:26 -0500   KubeletHasSufficientDisk     kubelet has sufficient disk space available
  MemoryPressure   False   Wed, 12 Sep 2018 18:26:03 -0500   Tue, 14 Aug 2018 00:54:26 -0500   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     False   Wed, 12 Sep 2018 18:26:03 -0500   Tue, 14 Aug 2018 00:54:26 -0500   KubeletHasNoDiskPressure     kubelet has no disk pressure
  Ready            True    Wed, 12 Sep 2018 18:26:03 -0500   Tue, 14 Aug 2018 00:54:36 -0500   KubeletReady                 kubelet is posting ready status
Addresses:
  InternalIP:  127.0.0.1
  Hostname:    127.0.0.1
Capacity:
 alpha.kubernetes.io/nvidia-gpu:  4
 cpu:                             128
 memory:                          1067492928Ki
 pods:                            110
Allocatable:
 alpha.kubernetes.io/nvidia-gpu:  4
 cpu:                             128
 memory:                          1067390528Ki
 pods:                            110
System Info:
 Machine ID:                 833e0926ee21aed71ec075d726cbcfe0
 System UUID:                101537A         
 Boot ID:                    6edbbc1e-475d-4ac7-b8dc-18227ff6a6f4
 Kernel Version:             3.10.0-862.6.3.el7.ppc64le
 OS Image:                   Debian GNU/Linux 8 (jessie)
 Operating System:           linux
 Architecture:               ppc64le
 Container Runtime Version:  docker://1.13.1
 Kubelet Version:            v1.8.3+icp
 Kube-Proxy Version:         v1.8.3+icp
ExternalID:                  127.0.0.1
Non-terminated Pods:         (20 in total)
  Namespace                  Name                                                               CPU Requests  CPU Limits  Memory Requests  Memory Limits  NvidiaGPU Limits
  ---------                  ----                                                               ------------  ----------  ---------------  -------------  -------------
  default                    powerai-vision-infer-ic-2c9713b3-a357-466e-a10c-596e71e735c6wvm    0 (0%)        0 (0%)      0 (0%)           0 (0%)         1 (25%)
  default                    powerai-vision-infer-od-2b2c5f0d-f6ae-4f46-8f0b-b946c2bc18kznb6    0 (0%)        0 (0%)      0 (0%)           0 (0%)         1 (25%)
  default                    powerai-vision-keycloak-987f9698d-kdvp2                            0 (0%)        0 (0%)      0 (0%)           0 (0%)         0 (0%)
  default                    powerai-vision-mongodb-686ffbd4b9-cq2mx                            0 (0%)        0 (0%)      0 (0%)           0 (0%)         0 (0%)
  default                    powerai-vision-portal-768c4ffdc7-9686z                             0 (0%)        0 (0%)      0 (0%)           0 (0%)         0 (0%)
  default                    powerai-vision-postgres-7c88467b4c-swz8m                           0 (0%)        0 (0%)      0 (0%)           0 (0%)         0 (0%)
  default                    powerai-vision-taskanaly-788bdcc5cc-dc5cp                          0 (0%)        0 (0%)      0 (0%)           0 (0%)         0 (0%)
  default                    powerai-vision-ui-844bb5f7f9-t55bg                                 0 (0%)        0 (0%)      0 (0%)           0 (0%)         0 (0%)
  default                    powerai-vision-video-nginx-845544c55b-w8ks6                        0 (0%)        0 (0%)      0 (0%)           0 (0%)         0 (0%)
  default                    powerai-vision-video-portal-7499b577c7-csvs8                       0 (0%)        0 (0%)      0 (0%)           0 (0%)         0 (0%)
  default                    powerai-vision-video-rabmq-85b486bc9d-nqdmn                        0 (0%)        0 (0%)      0 (0%)           0 (0%)         0 (0%)
  default                    powerai-vision-video-redis-67d9766fb6-gw7hd                        0 (0%)        0 (0%)      0 (0%)           0 (0%)         0 (0%)
  default                    powerai-vision-video-test-nginx-b5cb6b759-fkwp5                    0 (0%)        0 (0%)      0 (0%)           0 (0%)         0 (0%)
  default                    powerai-vision-video-test-portal-756f7f9b99-mtwwj                  0 (0%)        0 (0%)      0 (0%)           0 (0%)         0 (0%)
  default                    powerai-vision-video-test-rabmq-578c775d4-chbtd                    0 (0%)        0 (0%)      0 (0%)           0 (0%)         0 (0%)
  default                    powerai-vision-video-test-redis-779f7545b4-dffws                   0 (0%)        0 (0%)      0 (0%)           0 (0%)         0 (0%)
  kube-system                default-http-backend-77c86f88b4-g8sp9                              0 (0%)        0 (0%)      0 (0%)           0 (0%)         0 (0%)
  kube-system                kube-dns-c5b9d46b-67dvg                                            0 (0%)        0 (0%)      0 (0%)           0 (0%)         0 (0%)
  kube-system                nginx-ingress-lb-ppc64le-pdzs6                                     0 (0%)        0 (0%)      0 (0%)           0 (0%)         0 (0%)
  kube-system                tiller-deploy-5f954f4845-kkv8z                                     0 (0%)        0 (0%)      0 (0%)           0 (0%)         0 (0%)
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  CPU Requests  CPU Limits  Memory Requests  Memory Limits  NvidiaGPU Limits
  ------------  ----------  ---------------  -------------  -------------
  0 (0%)        0 (0%)      0 (0%)           0 (0%)         2 (50%)
Events:         <none>
#
Interpreting the output
  • Most of the information is informational regarding the system resources (CPUs, GPUs, memory) and version information (OS, Docker, Kubernetes).
  • The Conditions section can indicate whether there are system resource issues that will affect the running of the application. For example, if any of the OutOfDisk, MemoryPressure, or DiskPressure conditions are True, there are insufficient system resources to run PowerAI Vision. For example, the following Conditions section shows a system that does not have sufficient disk space available, indicated by DiskPressure status of True:
    Conditions:
      Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
      ----             ------  -----------------                 ------------------                ------                       -------
      OutOfDisk        False   [...]                               [...]                    KubeletHasSufficientDisk     kubelet has sufficient disk space available
      MemoryPressure   False   [...]                               [...]                    KubeletHasSufficientMemory   kubelet has sufficient memory available
      DiskPressure     True    [...]                               [...]                    KubeletHasDiskPressure       kubelet has disk pressure
      Ready            True    [...]                               [...]                    KubeletReady                 kubelet is posting ready status
  • The Events section will also have messages that can indicate if there are issues with the environment. For example, the following events indicate issues with disk space that have led to Kubernetes attempting to reclaim resources ("eviction") which can affect the availability of Kubernetes applications:
    Events:
      Type     Reason                Age               From                Message
      ----     ------                ----              ----                -------
      Normal   NodeHasDiskPressure   5m                kubelet, 127.0.0.1  Node 127.0.0.1 status is now: NodeHasDiskPressure
      Warning  EvictionThresholdMet  3s (x23 over 5m)  kubelet, 127.0.0.1  Attempting to reclaim nodefs

kubectl describe pods command

The kubectl.sh describe pods command provides detailed information about each of the pods used by the PowerAI Vision application. If the output from a specific pod is desired, the command kubectl.sh describe pod podname. To determine the values for podname look at the output from kubectl.sh get pods.

Example output

The output from the command is verbose, so sample output from only one pod is shown:
# kubectl describe pods
...
Name:           powerai-vision-ui-844bb5f7f9-khxlm
Namespace:      default
Node:           127.0.0.1/127.0.0.1
Start Time:     Mon, 17 Sep 2018 12:27:18 -0500
Labels:         app=powerai-vision
                chart=ibm-powerai-vision-prod-1.1.0
                component=powerai-vision-ui
                heritage=Tiller
                pod-template-hash=4006619395
                release=vision
                run=powerai-vision-ui-deployment-pod
Annotations:    checksum/config=b131ad79a4838feecc85cde3b422bc82ef3214a437462e02b775df6d3582a4f6
                kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicaSet","namespace":"default","name":"powerai-vision-ui-844bb5f7f9","uid":"e4ee9eab-ba9e-11e8-9e14-98b...
                productID=5737-H10
                productName=IBM PowerAI Vision
                productVersion=1.1.1.0
Status:         Running
IP:             172.17.0.14
Created By:     ReplicaSet/powerai-vision-ui-844bb5f7f9
Controlled By:  ReplicaSet/powerai-vision-ui-844bb5f7f9
Containers:
  powerai-vision-ui:
    Container ID:   docker://16867f9922458d4a517018f52edc6c319e1e9d6408cc333cf242be618e179425
    Image:          powerai-vision-ui:1.1.1.0
    Image ID:       docker://sha256:c327077b7d605518f6b4651141ab864459212d34022fabad844073e5e66e9b9c
    Port:           80/TCP
    State:          Running
      Started:      Mon, 17 Sep 2018 12:27:33 -0500
    Ready:          True
    Restart Count:  0
    Liveness:       http-get http://:http/powerai-vision/index.html delay=240s timeout=5s period=10s #success=1 #failure=3
    Readiness:      http-get http://:http/powerai-vision/index.html delay=5s timeout=1s period=10s #success=1 #failure=3
    Environment:
      CONTEXT_ROOT:             <set to the key 'CONTEXT_ROOT' of config map 'powerai-vision-config'>             Optional: false
      DLAAS_API_SERVER:         <set to the key 'DLAAS_API_SERVER' of config map 'powerai-vision-config'>         Optional: false
      SERVER_HOST_VIDEO_TEST:   <set to the key 'SERVER_HOST_VIDEO_TEST' of config map 'powerai-vision-config'>   Optional: false
      SERVICE_PORT_VIDEO_TEST:  <set to the key 'SERVICE_PORT_VIDEO_TEST' of config map 'powerai-vision-config'>  Optional: false
      WEBROOT_VIDEO_TEST:       <set to the key 'WEBROOT_VIDEO_TEST' of config map 'powerai-vision-config'>       Optional: false
    Mounts:
      /opt/powerai-vision/data from data-mount (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-hr9zz (ro)
Conditions:
  Type           Status
  Initialized    True 
  Ready          True 
  PodScheduled   True 
Volumes:
  data-mount:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  powerai-vision-data-pvc
    ReadOnly:   false
  default-token-hr9zz:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-hr9zz
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  beta.kubernetes.io/arch=ppc64le
Tolerations:     node.alpha.kubernetes.io/notReady:NoExecute for 300s
                 node.alpha.kubernetes.io/unreachable:NoExecute for 300s
Events:          <none>
...

Interpreting the output

Significant fields providing status of the application pods include:

  • Information about the product name and version are given in productName and productVersion.
  • The Status field should be Running. Any other status indicates problems with the application pod.
  • If there are issues with a pod, the Events section of the pod should have information about problems encountered.