Checking Kubernetes node status
Use these commands to check the status of the nodes in the environment.
kubectl.sh get pods
The kubectl command is used to show the detailed status of the Kubernetes pods deployed to run the PowerAI Vision application.
[root@aprilmin8 bin]# /opt/powerai-vision/bin/kubectl get pods
NAME READY STATUS RESTARTS AGE
powerai-vision-infer-ic-f873e5ff-821b-472a-bec0-32b1cdc4b59ns7j 1/1 Running 0 14d
powerai-vision-infer-od-fae2aa93-e394-4a04-bec7-f072b81f2678b2m 1/1 Running 0 14d
powerai-vision-keycloak-987f9698d-kdvp2 1/1 Running 1 14d
powerai-vision-mongodb-686ffbd4b9-cq2mx 1/1 Running 0 14d
powerai-vision-portal-768c4ffdc7-9ppnt 1/1 Running 0 14d
powerai-vision-postgres-7c88467b4c-swz8m 1/1 Running 0 14d
powerai-vision-taskanaly-788bdcc5cc-dc5cp 1/1 Running 0 14d
powerai-vision-ui-844bb5f7f9-t55bg 1/1 Running 0 14d
powerai-vision-video-nginx-845544c55b-w8ks6 1/1 Running 0 14d
powerai-vision-video-portal-7499b577c7-csvs8 1/1 Running 0 14d
powerai-vision-video-rabmq-85b486bc9d-nqdmn 1/1 Running 0 14d
powerai-vision-video-redis-67d9766fb6-gw7hd 1/1 Running 0 14d
powerai-vision-video-test-nginx-b5cb6b759-fkwp5 1/1 Running 0 14d
powerai-vision-video-test-portal-756f7f9b99-mtwwj 1/1 Running 0 14d
powerai-vision-video-test-rabmq-578c775d4-chbtd 1/1 Running 0 14d
powerai-vision-video-test-redis-779f7545b4-dffws 1/1 Running 0 14d
Interpreting the output
- When the application is running correctly, each of the pods should have:
- A value of 1/1 in the READY column
- A value of Running in the STATUS column
- In the above example output, pods with infer in the name are created when a model is deployed. These will only appear if there are models deployed in the instance of the application running on the system.
- A STATUS value other than Running indicates an issue with the pod.
- A non-0 and increasing value in the RESTARTS column indicates an issue with that pod.
kubectl describe nodes command
The kubectl describe nodes command provides status information regarding the Kubernetes environment used to run the PowerAI Vision application.
# /opt/powerai-vision/bin/kubectl.sh describe nodes
Name: 127.0.0.1
Roles: <none>
Labels: beta.kubernetes.io/arch=ppc64le
beta.kubernetes.io/os=linux
gpu/nvidia=TeslaP100-SXM2-16GB
kubernetes.io/hostname=127.0.0.1
Annotations: node.alpha.kubernetes.io/ttl=0
volumes.kubernetes.io/controller-managed-attach-detach=true
Taints: <none>
CreationTimestamp: Tue, 14 Aug 2018 00:54:26 -0500
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
OutOfDisk False Wed, 12 Sep 2018 18:26:03 -0500 Tue, 14 Aug 2018 00:54:26 -0500 KubeletHasSufficientDisk kubelet has sufficient disk space available
MemoryPressure False Wed, 12 Sep 2018 18:26:03 -0500 Tue, 14 Aug 2018 00:54:26 -0500 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Wed, 12 Sep 2018 18:26:03 -0500 Tue, 14 Aug 2018 00:54:26 -0500 KubeletHasNoDiskPressure kubelet has no disk pressure
Ready True Wed, 12 Sep 2018 18:26:03 -0500 Tue, 14 Aug 2018 00:54:36 -0500 KubeletReady kubelet is posting ready status
Addresses:
InternalIP: 127.0.0.1
Hostname: 127.0.0.1
Capacity:
alpha.kubernetes.io/nvidia-gpu: 4
cpu: 128
memory: 1067492928Ki
pods: 110
Allocatable:
alpha.kubernetes.io/nvidia-gpu: 4
cpu: 128
memory: 1067390528Ki
pods: 110
System Info:
Machine ID: 833e0926ee21aed71ec075d726cbcfe0
System UUID: 101537A
Boot ID: 6edbbc1e-475d-4ac7-b8dc-18227ff6a6f4
Kernel Version: 3.10.0-862.6.3.el7.ppc64le
OS Image: Debian GNU/Linux 8 (jessie)
Operating System: linux
Architecture: ppc64le
Container Runtime Version: docker://1.13.1
Kubelet Version: v1.8.3+icp
Kube-Proxy Version: v1.8.3+icp
ExternalID: 127.0.0.1
Non-terminated Pods: (20 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits NvidiaGPU Limits
--------- ---- ------------ ---------- --------------- ------------- -------------
default powerai-vision-infer-ic-2c9713b3-a357-466e-a10c-596e71e735c6wvm 0 (0%) 0 (0%) 0 (0%) 0 (0%) 1 (25%)
default powerai-vision-infer-od-2b2c5f0d-f6ae-4f46-8f0b-b946c2bc18kznb6 0 (0%) 0 (0%) 0 (0%) 0 (0%) 1 (25%)
default powerai-vision-keycloak-987f9698d-kdvp2 0 (0%) 0 (0%) 0 (0%) 0 (0%) 0 (0%)
default powerai-vision-mongodb-686ffbd4b9-cq2mx 0 (0%) 0 (0%) 0 (0%) 0 (0%) 0 (0%)
default powerai-vision-portal-768c4ffdc7-9686z 0 (0%) 0 (0%) 0 (0%) 0 (0%) 0 (0%)
default powerai-vision-postgres-7c88467b4c-swz8m 0 (0%) 0 (0%) 0 (0%) 0 (0%) 0 (0%)
default powerai-vision-taskanaly-788bdcc5cc-dc5cp 0 (0%) 0 (0%) 0 (0%) 0 (0%) 0 (0%)
default powerai-vision-ui-844bb5f7f9-t55bg 0 (0%) 0 (0%) 0 (0%) 0 (0%) 0 (0%)
default powerai-vision-video-nginx-845544c55b-w8ks6 0 (0%) 0 (0%) 0 (0%) 0 (0%) 0 (0%)
default powerai-vision-video-portal-7499b577c7-csvs8 0 (0%) 0 (0%) 0 (0%) 0 (0%) 0 (0%)
default powerai-vision-video-rabmq-85b486bc9d-nqdmn 0 (0%) 0 (0%) 0 (0%) 0 (0%) 0 (0%)
default powerai-vision-video-redis-67d9766fb6-gw7hd 0 (0%) 0 (0%) 0 (0%) 0 (0%) 0 (0%)
default powerai-vision-video-test-nginx-b5cb6b759-fkwp5 0 (0%) 0 (0%) 0 (0%) 0 (0%) 0 (0%)
default powerai-vision-video-test-portal-756f7f9b99-mtwwj 0 (0%) 0 (0%) 0 (0%) 0 (0%) 0 (0%)
default powerai-vision-video-test-rabmq-578c775d4-chbtd 0 (0%) 0 (0%) 0 (0%) 0 (0%) 0 (0%)
default powerai-vision-video-test-redis-779f7545b4-dffws 0 (0%) 0 (0%) 0 (0%) 0 (0%) 0 (0%)
kube-system default-http-backend-77c86f88b4-g8sp9 0 (0%) 0 (0%) 0 (0%) 0 (0%) 0 (0%)
kube-system kube-dns-c5b9d46b-67dvg 0 (0%) 0 (0%) 0 (0%) 0 (0%) 0 (0%)
kube-system nginx-ingress-lb-ppc64le-pdzs6 0 (0%) 0 (0%) 0 (0%) 0 (0%) 0 (0%)
kube-system tiller-deploy-5f954f4845-kkv8z 0 (0%) 0 (0%) 0 (0%) 0 (0%) 0 (0%)
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
CPU Requests CPU Limits Memory Requests Memory Limits NvidiaGPU Limits
------------ ---------- --------------- ------------- -------------
0 (0%) 0 (0%) 0 (0%) 0 (0%) 2 (50%)
Events: <none>
#
- Most of the information is informational regarding the system resources (CPUs, GPUs, memory) and version information (OS, Docker, Kubernetes).
- The Conditions section can indicate whether there are system resource issues that will
affect the running of the application. For example, if any of the OutOfDisk,
MemoryPressure, or DiskPressure conditions are True, there are
insufficient system resources to run PowerAI Vision. For
example, the following Conditions section shows a system that does not have sufficient disk
space available, indicated by DiskPressure status of
True:
Conditions: Type Status LastHeartbeatTime LastTransitionTime Reason Message ---- ------ ----------------- ------------------ ------ ------- OutOfDisk False [...] [...] KubeletHasSufficientDisk kubelet has sufficient disk space available MemoryPressure False [...] [...] KubeletHasSufficientMemory kubelet has sufficient memory available DiskPressure True [...] [...] KubeletHasDiskPressure kubelet has disk pressure Ready True [...] [...] KubeletReady kubelet is posting ready status
- The Events section will also have messages that can indicate if there are issues with the
environment. For example, the following events indicate issues with disk space that have led to
Kubernetes attempting to reclaim resources ("eviction") which can affect the availability of
Kubernetes
applications:
Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal NodeHasDiskPressure 5m kubelet, 127.0.0.1 Node 127.0.0.1 status is now: NodeHasDiskPressure Warning EvictionThresholdMet 3s (x23 over 5m) kubelet, 127.0.0.1 Attempting to reclaim nodefs
kubectl describe pods command
The kubectl.sh describe pods command provides detailed information about each of the pods used by the PowerAI Vision application. If the output from a specific pod is desired, the command kubectl.sh describe pod podname. To determine the values for podname look at the output from kubectl.sh get pods.
Example output
# kubectl describe pods
...
Name: powerai-vision-ui-844bb5f7f9-khxlm
Namespace: default
Node: 127.0.0.1/127.0.0.1
Start Time: Mon, 17 Sep 2018 12:27:18 -0500
Labels: app=powerai-vision
chart=ibm-powerai-vision-prod-1.1.0
component=powerai-vision-ui
heritage=Tiller
pod-template-hash=4006619395
release=vision
run=powerai-vision-ui-deployment-pod
Annotations: checksum/config=b131ad79a4838feecc85cde3b422bc82ef3214a437462e02b775df6d3582a4f6
kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicaSet","namespace":"default","name":"powerai-vision-ui-844bb5f7f9","uid":"e4ee9eab-ba9e-11e8-9e14-98b...
productID=5737-H10
productName=IBM PowerAI Vision
productVersion=1.1.1.0
Status: Running
IP: 172.17.0.14
Created By: ReplicaSet/powerai-vision-ui-844bb5f7f9
Controlled By: ReplicaSet/powerai-vision-ui-844bb5f7f9
Containers:
powerai-vision-ui:
Container ID: docker://16867f9922458d4a517018f52edc6c319e1e9d6408cc333cf242be618e179425
Image: powerai-vision-ui:1.1.1.0
Image ID: docker://sha256:c327077b7d605518f6b4651141ab864459212d34022fabad844073e5e66e9b9c
Port: 80/TCP
State: Running
Started: Mon, 17 Sep 2018 12:27:33 -0500
Ready: True
Restart Count: 0
Liveness: http-get http://:http/powerai-vision/index.html delay=240s timeout=5s period=10s #success=1 #failure=3
Readiness: http-get http://:http/powerai-vision/index.html delay=5s timeout=1s period=10s #success=1 #failure=3
Environment:
CONTEXT_ROOT: <set to the key 'CONTEXT_ROOT' of config map 'powerai-vision-config'> Optional: false
DLAAS_API_SERVER: <set to the key 'DLAAS_API_SERVER' of config map 'powerai-vision-config'> Optional: false
SERVER_HOST_VIDEO_TEST: <set to the key 'SERVER_HOST_VIDEO_TEST' of config map 'powerai-vision-config'> Optional: false
SERVICE_PORT_VIDEO_TEST: <set to the key 'SERVICE_PORT_VIDEO_TEST' of config map 'powerai-vision-config'> Optional: false
WEBROOT_VIDEO_TEST: <set to the key 'WEBROOT_VIDEO_TEST' of config map 'powerai-vision-config'> Optional: false
Mounts:
/opt/powerai-vision/data from data-mount (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-hr9zz (ro)
Conditions:
Type Status
Initialized True
Ready True
PodScheduled True
Volumes:
data-mount:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: powerai-vision-data-pvc
ReadOnly: false
default-token-hr9zz:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-hr9zz
Optional: false
QoS Class: BestEffort
Node-Selectors: beta.kubernetes.io/arch=ppc64le
Tolerations: node.alpha.kubernetes.io/notReady:NoExecute for 300s
node.alpha.kubernetes.io/unreachable:NoExecute for 300s
Events: <none>
...
Interpreting the output
Significant fields providing status of the application pods include:
- Information about the product name and version are given in productName and productVersion.
- The Status field should be Running. Any other status indicates problems with the application pod.
- If there are issues with a pod, the Events section of the pod should have information about problems encountered.