Engine failure status check
You can check the engine failure status by using the oc describe
wxdengine command and by checking the ibm-lakehouse-controller-manager
logs.
watsonx.data Developer edition
watsonx.data on IBM Software Hub
Procedure
- Run the following command to display the
StatusandEventsinformation of an engine.
Example output:oc describe wxdengine <enginename> -n <operand namespace>ibm-lh-lakehouse-prestissimo57-coordinator-blue-0 0/1 Pending 0 83m ibm-lh-lakehouse-prestissimo57-prestissimo-worker-0 0/1 Pending 0 83m ibm-lh-lakehouse-prestissimo57-prestissimo-worker-1 0/1 Pending 0 83m ibm-lh-lakehouse-prestissimo57-prestissimo-worker-2 0/1 Pending 0 83m ibm-lh-lakehouse-prestissimo57-prestissimo-worker-3 0/1 Pending 0 83m ibm-lh-lakehouse-prestissimo57-prestissimo-worker-4 0/1 Pending 0 83m ibm-lh-lakehouse-prestissimo57-prestissimo-worker-5 0/1 Pending 0 83m ibm-lh-lakehouse-prestissimo57-prestissimo-worker-6 0/1 Pending 0 83m ibm-lh-lakehouse-prestissimo57-prestissimo-worker-7 0/1 Pending 0 83m ibm-lh-lakehouse-prestissimo57-prestissimo-worker-8 0/1 Pending 0 83m ibm-lh-lakehouse-prestissimo57-prestissimo-worker-9 0/1 Pending 0 83m - Run the oc describe command on the pod to identify the route cause of
the failure.
For example:oc describe pod <pod_name>
Output:oc describe pod ibm-lh-lakehouse-prestissimo57-prestissimo-worker-0Tolerations: node.kubernetes.io/memory-pressure:NoSchedule op=Exists node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 59m ibm-cpd-scheduler 0/18 nodes are available: 15 Insufficient cpu, 15 Insufficient memory, 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }. preemption: 0/18 nodes are available: 15 No preemption victims found for incoming pod, 3 Preemption is not helpful for scheduling. Warning FailedScheduling 13m (x550 over 59m) ibm-cpd-scheduler 0/18 nodes are available: 15 Insufficient cpu, 15 Insufficient memory, 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }. preemption: 0/18 nodes are available: 15 No preemption victims found for incoming pod, 3 Preemption is not helpful for scheduling. Normal QueuePosition 59m ibm-cpd-scheduler Queue Position: 2