Administering Kubernetes
Provides a quick overview of commonly used commands to troubleshoot a cluster with direct application to IBM Financial Crimes Insight for Watson, Private. For in-depth Kubernetes information, see Kubernetes product information.
Examining the current state
To find out the current state of the cluster, the following kubectl commands are available:
- The get command provides a summary about the type of object that is being examined. For more information, see Kubectl Reference Documentation (get command).
- The describe command provides details about a specific object of the specific type. For more information, see Kubectl Reference Documentation (describe command).
Examining node health
kubectl get nodes
Command output shows all nodes in the cluster, and the current state of the nodes:
NAME STATUS AGE VERSION
fcikuber-mst.rtp.raleigh.ibm.com Ready 18h v1.7.3
fcinode92.rtp.raleigh.ibm.com Ready 18h v1.7.3
fcinode93.rtp.raleigh.ibm.com Ready 18h v1.7.3
fcinode94.rtp.raleigh.ibm.com Ready 18h v1.7.3
fcinode95.rtp.raleigh.ibm.com Ready 18h v1.7.3
Output that specifies a status other than Ready indicates a problem. Nodes can be taken in and out of service, and their status is reflected. When nodes are facing pressure that is related to resources, the status is also indicated. For example, OutOfDisk indicates that the file system on the worker node is full. Kubernetes begins moving pods off the node until the situation is fixed and the status of the node moves back to Ready.
When the Kubernetes cluster is initially created, nodes start in NotReady state. After the node is ready to accept jobs, the status automatically moves to Ready state.
To get more details about a node, enter the following command:
kubectl describe node fcinode95.rtp.raleigh.ibm.com
The output of the command provides details about the particular node:
Name: fcinode95.rtp.raleigh.ibm.com
Role:
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
kubernetes.io/hostname=fcinode95.rtp.raleigh.ibm.com
Annotations: flannel.alpha.coreos.com/backend-data={"VtepMAC":"f6:e3:ed:e5:12:91"}
flannel.alpha.coreos.com/backend-type=vxlan
flannel.alpha.coreos.com/kube-subnet-manager=true
flannel.alpha.coreos.com/public-ip=9.37.30.95
node.alpha.kubernetes.io/ttl=0
volumes.kubernetes.io/controller-managed-attach-detach=true
Taints: <none>
CreationTimestamp: Thu, 28 Sep 2017 14:54:02 -0400
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
OutOfDisk False Fri, 29 Sep 2017 09:56:01 -0400 Thu, 28 Sep 2017 14:54:02 -0400 KubeletHasSufficientDisk kubelet has sufficient disk space available
MemoryPressure False Fri, 29 Sep 2017 09:56:01 -0400 Thu, 28 Sep 2017 14:54:02 -0400 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Fri, 29 Sep 2017 09:56:01 -0400 Thu, 28 Sep 2017 14:54:02 -0400 KubeletHasNoDiskPressure kubelet has no disk pressure
Ready True Fri, 29 Sep 2017 09:56:01 -0400 Thu, 28 Sep 2017 14:54:23 -0400 KubeletReady kubelet is posting ready status
Addresses:
InternalIP: 9.37.30.95
Hostname: fcinode95.rtp.raleigh.ibm.com
Capacity:
cpu: 4
memory: 8175660Ki
pods: 110
Allocatable:
cpu: 4
memory: 8073260Ki
pods: 110
System Info:
Machine ID: cd344ed1a09e4df484ef7248fc04281a
System UUID: 4234C0FE-FE90-4C08-4D87-4C031A9D7331
Boot ID: b4f3b1f5-ddd6-461b-807d-82af46a20f4c
Kernel Version: 3.10.0-514.26.2.el7.x86_64
OS Image: CentOS Linux 7 (Core)
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://Unknown
Kubelet Version: v1.7.3
Kube-Proxy Version: v1.7.3
PodCIDR: 10.244.2.0/24
ExternalID: fcinode95.rtp.raleigh.ibm.com
Non-terminated Pods: (4 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits
--------- ---- ------------ ---------- --------------- -------------
default db2-202245354-rl7gr 0 (0%) 0 (0%) 0 (0%) 0 (0%)
kube-system kube-flannel-ds-h9c9z 0 (0%) 0 (0%) 0 (0%) 0 (0%)
kube-system kube-proxy-knjh2 0 (0%) 0 (0%) 0 (0%) 0 (0%)
kube-system monitoring-grafana-1219411114-9bts0 0 (0%) 0 (0%) 0 (0%) 0 (0%)
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
CPU Requests CPU Limits Memory Requests Memory Limits
------------ ---------- --------------- -------------
0 (0%) 0 (0%) 0 (0%) 0 (0%)
Events: <none>
Examining pod health
FCI applications are deployed into pods in the default namespace. For FCI, a pod is simply a single Docker container.
The following command provides a list of pods that are currently installed and their status:
kubectl get pods
Output is similar to the following:
NAME READY STATUS RESTARTS AGE
analytics-2179514482-2kf6k 1/1 Running 0 18h
db2-202245354-rl7gr 1/1 Running 0 18h
mq-2553988797-6gmm4 1/1 Running 0 18h
solution-150650449-n1nnz 1/1 Running 0 18h
The READY and STATUS fields are important. Pods go through a lifecycle and some of the steps are lengthy. Until a pod has a STATUS of Running, the Ready field is at least 1/1, the application is not up. Kubernetes monitors pod status and does not route traffic to pods that are not Ready.
kubectl exec -it name_of_pod -- /bin/bash
kubectl exec -it analytics-2179514482-2kf6k -- /bin/bash
Then, while in the container, enter commands at the Linux prompt to troubleshoot the specific product that is running in the container.
kubectl exec -it name_of_pod -- /bin/bash -c "any_linux_string_of_commands"
kubectl exec -it mq-2553988797-6gmm4 -- /bin/bash -c "ps -ef | grep amq"
mqm 213 0 0 Feb09 ? 00:00:22 /opt/mqm/bin/amqzxma0 -m FCIQM -u mqm
mqm 221 213 0 Feb09 ? 00:00:03 /opt/mqm/bin/amqzfuma -m FCIQM
mqm 226 213 0 Feb09 ? 00:00:14 /opt/mqm/bin/amqzmuc0 -m FCIQM
mqm 245 213 0 Feb09 ? 00:00:54 /opt/mqm/bin/amqzmur0 -m FCIQM
........
kubectl get pods
NAME READY STATUS RESTARTS AGE
fci-analytics-3232108353-8g3rt 1/1 Running 0 53d
fci-messaging-409855742-ngzw5 1/1 Running 0 53d
fci-primaryds-3918237058-p1d7v 1/1 Running 0 53d
fci-solution-1268297780-v0jpp 0/1 Running 0 53d
Use the following commands to get status, start, and stop the Liberty solution server.
- To get server status and return the server's process
ID:
kubectl exec -it fci-solution-1268297780-v0jpp -- /opt/ibm/wlp/bin/server status solutionServer
- To stop the Liberty solution
server:
kubectl exec -it fci-solution-1268297780-v0jpp -- /opt/ibm/wlp/bin/server stop solutionServer
Note: To see if the Liberty solution server stopped, change to the logs directory to view the messages.log and console.log files:
If the Liberty solution server did not stop, go to the Kubernetes container and locate the Liberty Java process. Enter a kill -9 command on the Java process using the PID obtained from the server status solutionServer command. For example:cd /fci-exports/fci-solution/servers/solutionServer/logs
kubectl exec -it fci-solution-1268297780-v0jpp -- /bin/bash wlpadmin@fci-solution-1268297780-v0jpp:/$ kill -9 11306
To see if the Java process for the Liberty solution server stopped:wlpadmin@fci-solution-1268297780-v0jpp:/$ ps -ef | grep java
- To start the Liberty solution
server:
kubectl exec -it fci-solution-1268297780-v0jpp -- /opt/ibm/wlp/bin/server start solutionServer --clean
kubectl describe pod solution-150650449-n1nnz
It is important to review the events section to see if the pod is in a normal starting state, or if something went wrong and needs to be addressed.
Name: solution-150650449-n1nnz
Namespace: default
Node: fcinode93.rtp.raleigh.ibm.com/9.37.30.93
Start Time: Thu, 28 Sep 2017 15:08:21 -0400
Labels: app=solution
pod-template-hash=150650449
Annotations: kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicaSet","namespace":"default","name":"solution-150650449","uid":"64bd7878-a480-11e7-8963-005056b446a4"...
Status: Running
IP: 10.244.3.5
Created By: ReplicaSet/solution-150650449
Controlled By: ReplicaSet/solution-150650449
Init Containers:
init-mqservice:
Container ID: docker://eb3459ab60f0dd7855dcb7b6ba2957328dc7dadec32a3c1424b8650d8b91fa60
Image: giantswarm/tiny-tools
Image ID: docker-pullable://giantswarm/tiny-tools@sha256:8e6739b0083c8d67e0ad8aef98c60cd84881698e87bd23c102b1f782894c7bfe
Port: <none>
Command:
fish
-c
echo "waiting for fci-messaging..."; while true; set endpoints (curl -s --cacert /var/run/secrets/kubernetes.io/serviceaccount/ca.crt --header "Authorization: Bearer "(cat /var/run/secrets/kubernetes.io/serviceaccount/token) https://kubernetes.default.svc/api/v1/namespaces/default/endpoints/fci-messaging); echo $endpoints | jq "."; if test (echo $endpoints | jq -r ".subsets[]?.addresses // [] | length") -gt 0; exit 0; end; echo "waiting...";sleep 1; end
Args:
default
fci-messaging
State: Terminated
Reason: Completed
Exit Code: 0
Started: Thu, 28 Sep 2017 15:08:22 -0400
Finished: Thu, 28 Sep 2017 15:11:50 -0400
Ready: True
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from init-container-serviceaccount-token-l73n3 (ro)
init-db2service:
Container ID: docker://cb27cf5f3e550a5858e3f6482142394a4a2b68f025ce6c473365865511c97853
Image: giantswarm/tiny-tools
Image ID: docker-pullable://giantswarm/tiny-tools@sha256:8e6739b0083c8d67e0ad8aef98c60cd84881698e87bd23c102b1f782894c7bfe
Port: <none>
Command:
fish
-c
echo "waiting for fci-primaryds..."; while true; set endpoints (curl -s --cacert /var/run/secrets/kubernetes.io/serviceaccount/ca.crt --header "Authorization: Bearer "(cat /var/run/secrets/kubernetes.io/serviceaccount/token) https://kubernetes.default.svc/api/v1/namespaces/default/endpoints/fci-primaryds); echo $endpoints | jq "."; if test (echo $endpoints | jq -r ".subsets[]?.addresses // [] | length") -gt 0; exit 0; end; echo "waiting...";sleep 1; end
Args:
default
fci-primaryds
State: Terminated
Reason: Completed
Exit Code: 0
Started: Thu, 28 Sep 2017 15:11:51 -0400
Finished: Thu, 28 Sep 2017 15:11:51 -0400
Ready: True
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from init-container-serviceaccount-token-l73n3 (ro)
Containers:
solution:
Container ID: docker://49506f18f0758c361a3e74c5d2f3d2761e17a3fda4ba116f6cfe2d0986bbf807
Image: pltdockerrel.rtp.raleigh.ibm.com:5000/ibmcom/fci-solution:1.0.1
Image ID: docker-pullable://pltdockerrel.rtp.raleigh.ibm.com:5000/ibmcom/fci-solution@sha256:d7c6696330ef7518caf750faa2394b498ca1fdcd1ff2f6c8c95b08a298f4f8cb
Ports: 9080/TCP, 9443/TCP
Command:
/fci-solution/solution-kube-start.sh
State: Running
Started: Thu, 28 Sep 2017 15:11:53 -0400
Ready: True
Restart Count: 0
Liveness: exec [/fci-solution/solution-kube-live.sh] delay=5s timeout=1s period=5s #success=1 #failure=3
Readiness: http-get https://:9443/console delay=5s timeout=1s period=5s #success=1 #failure=3
Environment Variables from:
platform-config ConfigMap Optional: false
Environment: <none>
Mounts:
/fci-shared from shared-log-persistent-storage (rw)
/fci-solution from solution-log-persistent-storage (rw)
/opt/mqm from solution-mq-persistent-storage (rw)
/var/mqm from solution-mq-persistent-storage (rw)
/var/run/secrets/kubernetes.io/serviceaccount from init-container-serviceaccount-token-l73n3 (ro)
Conditions:
Type Status
Initialized True
Ready True
PodScheduled True
Volumes:
solution-log-persistent-storage:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: solution-log-claim
ReadOnly: false
solution-mq-persistent-storage:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: mq-log-claim
ReadOnly: false
shared-log-persistent-storage:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: shared-liberty-claim
ReadOnly: false
init-container-serviceaccount-token-l73n3:
Type: Secret (a volume populated by a Secret)
SecretName: init-container-serviceaccount-token-l73n3
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.alpha.kubernetes.io/notReady:NoExecute for 300s
node.alpha.kubernetes.io/unreachable:NoExecute for 300s
Events: <none>
Common event messages are listed.
Kubernetes system pods are in their own namespace, kube-system. To examine namespaces and the status of system pods, the following commands are available:
NAME STATUS AGE
default Active 19h
kube-public Active 19h
kube-system Active 19h
kubectl get namespaces
kubectl -n kube-system get pods
[root@fciva103 ~]# kubectl -n kube-system get pods
NAME READY STATUS RESTARTS AGE
calico-etcd-5970v 1/1 Running 2 89d
calico-kube-controllers-2305157444-42wg1 1/1 Running 3 89d
calico-node-b9ld2 2/2 Running 6 89d
default-http-backend-654799587-2trr5 1/1 Running 2 89d
etcd-fciva103u01.fyre.ibm.com 1/1 Running 2 89d
kube-apiserver-fciva103u01.fyre.ibm.com 1/1 Running 2 89d
kube-controller-manager-fciva103u01.fyre.ibm.com 1/1 Running 3 89d
kube-dns-2168611686-5l73b 3/3 Running 6 89d
kube-proxy-hstlr 1/1 Running 2 89d
kube-scheduler-fciva103u01.fyre.ibm.com 1/1 Running 3 89d
nginx-ingress-controller-1824274788-wmf71 1/1 Running 2 89d